简体   繁体   English

Python读取多个.txt文件并使用pandas行绑定

[英]Python read in multiple .txt files and row bind using pandas

I'm coming from R (and SAS) and am having an issue reading in a large set of .txt files (all stored in the same directory), and creating one large dataframe in pandas. 我来自R(和SAS),在读取大量.txt文件(都存储在同一目录中)并在熊猫中创建一个大数据框时遇到问题。 So far I have attempted an amalgamation of code - all of which fails miserably. 到目前为止,我已经尝试过合并代码-所有这些都不幸失败了。 I assume this is a simple task but lack the experience in python... 我认为这是一个简单的任务,但缺乏python的经验...

If it helps this is the data I would like to create one large dataframe with: http://www.ssa.gov/oact/babynames/limits.html - the state specific sets (50 in total, named for their state abbreviation.txt) 如果有帮助,我将使用以下数据创建一个大型数据框: http ://www.ssa.gov/oact/babynames/limits.html-特定于状态的集合(总共50个,以其状态缩写命名)。文本)

Please help! 请帮忙!

import pandas as pd

import glob

filelist = glob.glob("C:\Users\Dell\Downloads\Names\*.txt")

names = ['state', 'gender', 'year', 'name', 'count']

Then, I was thinking of using pd.concat, but am not sure - essentially I want to read in each dataset and then row.bind the sets together (given they all have the same columns)... 然后,我正在考虑使用pd.concat,但是不确定-本质上我想读取每个数据集,然后将它们绑定在一起(假设它们都具有相同的列)。

concat is nice since "join" is set to "outer" (ie union of index) by default. concat很不错,因为默认情况下“ join”设置为“ outer”(即索引的并集)。 You could just as easily use df.join(), but must specify "how" as "outer". 您可以轻松地使用df.join(),但必须将“如何”指定为“外部”。 Either way, you can build a dataframe quite simply: 无论哪种方式,都可以非常简单地构建数据框:

import pandas as pd
from glob import glob as gg

data = pd.DataFrame()
names = ['state', 'gender', 'year', 'name', 'count']

for f in gg('*.txt'):
    tmp = pd.read_csv(f,columns=names)

    data = pd.concat([data,tmp],axis=0,ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM