如何將CSV文件與Pandas組合（並添加識別列）

Question

如何將多個CSV文件一起添加，還有一個額外的列來指示每個文件的來源？

到目前為止，我有：

import os
import pandas as pd
import glob

os.chdir('C:\...')  # path to folder where all CSVs are stored
for f, i in zip(glob.glob('*.csv'), short_list):
   df = pd.read_csv(f, header = None)
   df.index = i * len(df) 
   dfs.append(df)

all_data = pd.concat(dfs, ignore_index=True)

除識別列外，一切正常。 i是一個strings列表，我想放在all_data A列。 每列的每一行都有一個字符串。 相反，它會返回很多數字，並給出一個TypeError: Index(....) must be called witha collection of some kind 。

預期產量：

str1 file1entry1
str1 file1entry2
str1 file1entry3
str2 file2entry1
str2 file2entry2
str2 file2entry3

其中short_list = ['str1', 'str2', 'str3']和file1entery1, file2entry2... etc來自我已有的CSV文件。

解決方案：我無法像解決方案建議的那樣在一行中獲得所有內容，但它指出了我正確的方向。

for f zip(glob.glob('*csv')):
    df = pd.read_csv(f, header = None)
    df = df.assign(id = os.path.basename(f)) # simpler than pulling from the array. Adds file name to each line. 
    dfs.append(df)

all_data = pd.concat(dfs)

Answer 1

您可以使用.assign（id = i）方法，它將為每個已解析的CSV添加id列，並使用i值填充它：

df = pd.concat([pd.read_csv(f, header = None).assign(id=i)
                for f, i in zip(glob.glob('*.csv), short_list)],
               ignore_index=True)

如何將CSV文件與Pandas組合（並添加識別列）

問題描述

1 個解決方案

解決方案1
3 已采納 2016-09-20 21:40:03

如何將CSV文件與Pandas組合（並添加識別列）

問題描述

1 個解決方案

解決方案1 3 已采納 2016-09-20 21:40:03

解決方案1
3 已采納 2016-09-20 21:40:03