Python read in multiple .txt files and row bind using pandas

Question

I'm coming from R (and SAS) and am having an issue reading in a large set of .txt files (all stored in the same directory), and creating one large dataframe in pandas. So far I have attempted an amalgamation of code - all of which fails miserably. I assume this is a simple task but lack the experience in python...

If it helps this is the data I would like to create one large dataframe with: http://www.ssa.gov/oact/babynames/limits.html - the state specific sets (50 in total, named for their state abbreviation.txt)

Please help!

import pandas as pd

import glob

filelist = glob.glob("C:\Users\Dell\Downloads\Names\*.txt")

names = ['state', 'gender', 'year', 'name', 'count']

Then, I was thinking of using pd.concat, but am not sure - essentially I want to read in each dataset and then row.bind the sets together (given they all have the same columns)...

Answer 1

concat is nice since "join" is set to "outer" (ie union of index) by default. You could just as easily use df.join(), but must specify "how" as "outer". Either way, you can build a dataframe quite simply:

import pandas as pd
from glob import glob as gg

data = pd.DataFrame()
names = ['state', 'gender', 'year', 'name', 'count']

for f in gg('*.txt'):
    tmp = pd.read_csv(f,columns=names)

    data = pd.concat([data,tmp],axis=0,ignore_index=True)

Python read in multiple .txt files and row bind using pandas

Question

1 answers

solution1
2 ACCPTED 2014-12-05 16:46:21

Python read in multiple .txt files and row bind using pandas

Question

1 answers

solution1 2 ACCPTED 2014-12-05 16:46:21

solution1
2 ACCPTED 2014-12-05 16:46:21