[英]Adding a column to multiple .csv files with the file name as you combine those .csv files into a single dataframe
I have 50.csv files with over 188k rows combined that I would need to add the file name to so that I am able to track which file it came from.我有 50.csv 文件组合超过 188k 行,我需要添加文件名,以便能够跟踪它来自哪个文件。 I have included the code I am using below which is able to combine the files into a single df.
我在下面包含了我正在使用的代码,它能够将文件组合成一个 df。
df = pd.DataFrame()
for file in files:
if file.endswith('.csv'):
df=df.append(pd.read_csv(file), ignore_index=True)
df.head()
You're almost there.您快到了。 Instead of appending directly the result of the
read_csv()
, store it and add a new column with the file name不要直接附加
read_csv()
的结果,而是存储它并添加一个带有文件名的新列
for file in files:
if file.endswith('.csv'):
df_new = pd.read_csv(file)
df_new['from_file'] = file
df = df.append(df_new, ignore_index=True)
Also if your file
variable is actually the whole path to the file, you can use os.path.basename(file)
which return the name of the file only, without the path.此外,如果您的
file
变量实际上是文件的整个路径,则可以使用os.path.basename(file)
仅返回文件名,而不返回路径。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.