简体   繁体   中英

Append to Pandas dataframe while looping over lines?

I'm still new to Pandas. Is it possible to initiate and append to a Pandas dataframe while looping over lines? My attempt is below, but it creates a dataframe with 1 column instead of 6 columns. Would it be easier to just save the modified input to a csv file and then read that csv file with Pandas? I'm probably going to do that now. Thanks!

import requests
import pandas as pd

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The header is on the first line.
    if i == 0:
        df = pd.DataFrame([s.strip() for s in l])
    ## Lines with 6 columns.
    elif len(l) == 6:
        df = df.append(pd.DataFrame([s.strip() for s in l]))
    ## Lines with 7 columns.
    elif len(l) == 7:
        df = df.append(pd.DataFrame([l[i].strip() for i in (0, 2, 3, 4, 5, 6)]))

You can load the whole file as a csv stream to a Dataframe without looping through each line.

import requests
import pandas as pd
import csv

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
df = pd.DataFrame(list(csv.reader(r.text.splitlines(), delimiter='\t')))

Update:

This should work now.

for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The header is on the first line.
    if i == 0:
        df = pd.DataFrame(columns = [s.strip() for s in l])
    ## Lines with 6 columns.
    elif len(l) == 6:
        df = df.append(pd.DataFrame(columns=df.columns,data=[[s.strip() for s in l]]))
    ## Lines with 7 columns.
    elif len(l) == 7:
        df = df.append(pd.DataFrame(columns=df.columns, data=[[l[i].strip() for i in (0, 2, 3, 4, 5, 6)]]))

Inspired by this answer I opted for this solution:

import requests
import pandas as pd

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
table = []
for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The first line is the header.
    if i == 0:
        table.append([s.strip() for s in l])
    ## Rows with 6 colums.
    elif len(l) == 6:
        table.append([s.strip() for s in l])
    ## Rows with 7 columns.
    elif len(l) == 7:
        table.append([l[i].strip() for i in (0, 2, 3, 4, 5, 6)])
    ## Skip rows with neither 6 nor 7 columns.
    else:
        pass
df = pd.DataFrame(table)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM