[英]How to append multiple CSV files and add an additional column indicating file name in Python?
I have over 20 CSV files in a single folder.我在一个文件夹中有 20 多个 CSV 文件。 All files have the same structure, they just represent different days.所有文件都具有相同的结构,它们只是代表不同的日子。
Example:例子:
Day01.csv Day01.csv
Day02.csv Day02.csv
Day03.csv Day03.csv
Day04.csv (and so on...) Day04.csv(等等...)
The files contain just two numeric columns: x and y.这些文件只包含两个数字列:x 和 y。 I would like to append all of these csv files together into one large file and add a column for the file name (day).我想将所有这些 csv 文件一起附加到一个大文件中,并为文件名(天)添加一列。 I have explored similar examples to generate the following code but this code adds each y to a separate column (Y1, Y2, Y3, Y4...and so on).我探索了类似的示例来生成以下代码,但此代码将每个 y 添加到单独的列(Y1、Y2、Y3、Y4...等)。 I would like to simply have this appended file as three columns: x, y, file name.我只想将此附加文件作为三列:x,y,文件名。 How can I modify the code to do the proper append?如何修改代码以进行正确的追加?
I have tried the code from this example: Read multiple csv files and Add filename as new column in pandas我已经尝试过这个例子中的代码: Read multiple csv files and Add filename as new column in pandas
import pandas as pd
import os
os.chdir('C:....path to my folder')
files = os.listdir()
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
However, this code does not append all Y values under one column.但是,此代码不会将所有 Y 值附加到一列下。 (all other aspects seem to work, however). (然而,所有其他方面似乎都有效)。 Can someone help with the code so that all Y values are under a single column?有人可以帮助编写代码,以便所有 Y 值都在一个列下吗?
The following should work by creating the filename
column before appending the dataframe
to your list.以下应该通过在将dataframe
附加到列表之前创建filename
名列来工作。
import os
import pandas as pd
file_list = []
for file in os.listdir():
if file.endswith('.csv'):
df = pd.read_csv(file,sep=";")
df['filename'] = file
file_list.append(df)
all_days = pd.concat(file_list, ignore_index=True)
all_days.to_csv("all.txt")
python is great at these simple task, almost too good to be true… python 擅长这些简单的任务,几乎好得令人难以置信……
fake_files = lambda n: '\n'.join(('%d\t%d'%(i, i+1) for i in range(n, n+3)))
file_name = 'fake_me%s.csv'
with open('my_new.csv', 'wt') as new:
for number in range(3): # os.listdir()
# with open(number) as to_add:
# rows = to_add.readlines()
rows_fake = fake_files(number*2).split('\n')
adjusted_rows = [file_name%number + '\t' + row for row in rows_fake]
new.write('\n'.join(adjusted_rows) + '\n')
with adjustments to your specific io and naming, this is all you need.调整您的特定 io 和命名,这就是您所需要的。 you can just copy the code and run it and study how it works.你可以复制代码并运行它并研究它是如何工作的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.