简体   繁体   中英

Python read in multiple .txt files and row bind using pandas

I'm coming from R (and SAS) and am having an issue reading in a large set of .txt files (all stored in the same directory), and creating one large dataframe in pandas. So far I have attempted an amalgamation of code - all of which fails miserably. I assume this is a simple task but lack the experience in python...

If it helps this is the data I would like to create one large dataframe with: http://www.ssa.gov/oact/babynames/limits.html - the state specific sets (50 in total, named for their state abbreviation.txt)

Please help!

import pandas as pd

import glob

filelist = glob.glob("C:\Users\Dell\Downloads\Names\*.txt")

names = ['state', 'gender', 'year', 'name', 'count']

Then, I was thinking of using pd.concat, but am not sure - essentially I want to read in each dataset and then row.bind the sets together (given they all have the same columns)...

concat is nice since "join" is set to "outer" (ie union of index) by default. You could just as easily use df.join(), but must specify "how" as "outer". Either way, you can build a dataframe quite simply:

import pandas as pd
from glob import glob as gg

data = pd.DataFrame()
names = ['state', 'gender', 'year', 'name', 'count']

for f in gg('*.txt'):
    tmp = pd.read_csv(f,columns=names)

    data = pd.concat([data,tmp],axis=0,ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM