[英]pandas recursive read_csv while adding column to each
I'm recursively reading many csv's in multiple directories, and each time a read one in I want to add a column called num
which is just the index of which csv it was in the list. 我递归地读取多个目录中的许多csv,每次读入时我想添加一个称为
num
的列,这只是列表中该csv的索引。
path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)
After I have the filenames I want to read each in and then add the column, but leave it as a generator to simply concat
afterwards. 之后,我有文件名我想读取每个然后添加之列,但把它作为一个发电机简单地
concat
之后。 Is it possible to enumerate
a generator? 是否可以
enumerate
发电机?
df_from_each_file = (pd.read_csv(f) for f in all_files)
df_from_each_file = (df.insert(0,'num',i,allow_duplicates=True) for i, df in enumerate(df_from_each_file))
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
This just returns a bunch of None
df's 这只会返回一堆
None
df
Use enumerate
and DataFrame.assign
within the generator
like: 在
generator
使用enumerate
和DataFrame.assign
,例如:
path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)
df_from_each_file = (pd.read_csv(f).assign(num=i) for i, f in enumerate(all_files))
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.