[英]How to Combine CSV Files with Pandas (And Add Identifying Column)
How do I add multiple CSV files together and an extra column to indicate where each file came from? 如何将多个CSV文件一起添加,还有一个额外的列来指示每个文件的来源?
So far I have: 到目前为止,我有:
import os
import pandas as pd
import glob
os.chdir('C:\...') # path to folder where all CSVs are stored
for f, i in zip(glob.glob('*.csv'), short_list):
df = pd.read_csv(f, header = None)
df.index = i * len(df)
dfs.append(df)
all_data = pd.concat(dfs, ignore_index=True)
It all works well, except for the identifying column. 除识别列外,一切正常。
i
is a list of strings
that I want to put in column A of all_data
. i
是一个strings
列表,我想放在all_data
A列。 One string for every row of each column. 每列的每一行都有一个字符串。 Instead it returns a lot of numbers, and gives a
TypeError: Index(....) must be called witha collection of some kind
. 相反,它会返回很多数字,并给出一个
TypeError: Index(....) must be called witha collection of some kind
。
Expected output: 预期产量:
str1 file1entry1
str1 file1entry2
str1 file1entry3
str2 file2entry1
str2 file2entry2
str2 file2entry3
Where short_list = ['str1', 'str2', 'str3']
, and file1entery1, file2entry2... etc
comes from the CSV files I already have. 其中
short_list = ['str1', 'str2', 'str3']
和file1entery1, file2entry2... etc
来自我已有的CSV文件。
Solution: I wasn't able to get it all in one line like the solution suggested, however it pointed me in the right direction. 解决方案:我无法像解决方案建议的那样在一行中获得所有内容,但它指出了我正确的方向。
for f zip(glob.glob('*csv')):
df = pd.read_csv(f, header = None)
df = df.assign(id = os.path.basename(f)) # simpler than pulling from the array. Adds file name to each line.
dfs.append(df)
all_data = pd.concat(dfs)
you can use .assign(id=i) method, which will add id
column to each parsed CSV and will populate it with the i
value: 您可以使用.assign(id = i)方法,它将为每个已解析的CSV添加
id
列,并使用i
值填充它:
df = pd.concat([pd.read_csv(f, header = None).assign(id=i)
for f, i in zip(glob.glob('*.csv), short_list)],
ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.