简体   繁体   English

大熊猫递归read_csv,同时将列添加到每个

[英]pandas recursive read_csv while adding column to each

I'm recursively reading many csv's in multiple directories, and each time a read one in I want to add a column called num which is just the index of which csv it was in the list. 我递归地读取多个目录中的许多csv,每次读入时我想添加一个称为num的列,这只是列表中该csv的索引。

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

After I have the filenames I want to read each in and then add the column, but leave it as a generator to simply concat afterwards. 之后,我有文件名我想读取每个然后添加之列,但把它作为一个发电机简单地concat之后。 Is it possible to enumerate a generator? 是否可以enumerate发电机?

df_from_each_file = (pd.read_csv(f) for f in all_files)
df_from_each_file = (df.insert(0,'num',i,allow_duplicates=True) for i, df in enumerate(df_from_each_file))
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)

This just returns a bunch of None df's 这只会返回一堆None df

Use enumerate and DataFrame.assign within the generator like: generator使用enumerateDataFrame.assign ,例如:

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

df_from_each_file = (pd.read_csv(f).assign(num=i) for i, f in enumerate(all_files))    
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM