如何将CSV文件与Pandas组合（并添加识别列）

Question

How do I add multiple CSV files together and an extra column to indicate where each file came from? 如何将多个CSV文件一起添加，还有一个额外的列来指示每个文件的来源？

So far I have: 到目前为止，我有：

import os
import pandas as pd
import glob

os.chdir('C:\...')  # path to folder where all CSVs are stored
for f, i in zip(glob.glob('*.csv'), short_list):
   df = pd.read_csv(f, header = None)
   df.index = i * len(df) 
   dfs.append(df)

all_data = pd.concat(dfs, ignore_index=True)

It all works well, except for the identifying column. 除识别列外，一切正常。 i is a list of strings that I want to put in column A of all_data . i是一个strings列表，我想放在all_data A列。 One string for every row of each column. 每列的每一行都有一个字符串。 Instead it returns a lot of numbers, and gives a TypeError: Index(....) must be called witha collection of some kind . 相反，它会返回很多数字，并给出一个TypeError: Index(....) must be called witha collection of some kind 。

Expected output: 预期产量：

str1 file1entry1
str1 file1entry2
str1 file1entry3
str2 file2entry1
str2 file2entry2
str2 file2entry3

Where short_list = ['str1', 'str2', 'str3'] , and file1entery1, file2entry2... etc comes from the CSV files I already have. 其中short_list = ['str1', 'str2', 'str3']和file1entery1, file2entry2... etc来自我已有的CSV文件。

Solution: I wasn't able to get it all in one line like the solution suggested, however it pointed me in the right direction. 解决方案：我无法像解决方案建议的那样在一行中获得所有内容，但它指出了我正确的方向。

for f zip(glob.glob('*csv')):
    df = pd.read_csv(f, header = None)
    df = df.assign(id = os.path.basename(f)) # simpler than pulling from the array. Adds file name to each line. 
    dfs.append(df)

all_data = pd.concat(dfs)

Answer 1

you can use .assign(id=i) method, which will add id column to each parsed CSV and will populate it with the i value: 您可以使用.assign（id = i）方法，它将为每个已解析的CSV添加id列，并使用i值填充它：

df = pd.concat([pd.read_csv(f, header = None).assign(id=i)
                for f, i in zip(glob.glob('*.csv), short_list)],
               ignore_index=True)

如何将CSV文件与Pandas组合（并添加识别列）

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-09-20 21:40:03

如何将CSV文件与Pandas组合（并添加识别列）

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-09-20 21:40:03

解决方案1
3 已采纳 2016-09-20 21:40:03