I'm still new to Pandas. Is it possible to initiate and append to a Pandas dataframe while looping over lines? My attempt is below, but it creates a dataframe with 1 column instead of 6 columns. Would it be easier to just save the modified input to a csv file and then read that csv file with Pandas? I'm probably going to do that now. Thanks!
import requests
import pandas as pd
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The header is on the first line.
if i == 0:
df = pd.DataFrame([s.strip() for s in l])
## Lines with 6 columns.
elif len(l) == 6:
df = df.append(pd.DataFrame([s.strip() for s in l]))
## Lines with 7 columns.
elif len(l) == 7:
df = df.append(pd.DataFrame([l[i].strip() for i in (0, 2, 3, 4, 5, 6)]))
You can load the whole file as a csv stream to a Dataframe without looping through each line.
import requests
import pandas as pd
import csv
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
df = pd.DataFrame(list(csv.reader(r.text.splitlines(), delimiter='\t')))
Update:
This should work now.
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The header is on the first line.
if i == 0:
df = pd.DataFrame(columns = [s.strip() for s in l])
## Lines with 6 columns.
elif len(l) == 6:
df = df.append(pd.DataFrame(columns=df.columns,data=[[s.strip() for s in l]]))
## Lines with 7 columns.
elif len(l) == 7:
df = df.append(pd.DataFrame(columns=df.columns, data=[[l[i].strip() for i in (0, 2, 3, 4, 5, 6)]]))
Inspired by this answer I opted for this solution:
import requests
import pandas as pd
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
table = []
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The first line is the header.
if i == 0:
table.append([s.strip() for s in l])
## Rows with 6 colums.
elif len(l) == 6:
table.append([s.strip() for s in l])
## Rows with 7 columns.
elif len(l) == 7:
table.append([l[i].strip() for i in (0, 2, 3, 4, 5, 6)])
## Skip rows with neither 6 nor 7 columns.
else:
pass
df = pd.DataFrame(table)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.