[英]Python for merging multiple files from a directory into one single file
I need a single file with many columns(=number of files in the directory), from multiple file in the directory.. Each files has unique IDs which will not change for all files and so I need to merge these files based on that id. 我需要一个包含多列(=目录中文件数)的文件,来自目录中的多个文件。每个文件都有唯一的ID,不会对所有文件都更改,因此我需要根据该ID合并这些文件。
For example, file_1 looks like this 例如,file_1看起来像这样
id pool1
ABL1 1352
ABL12 1236
ABL13 1022
ABL14 815
ABL15 1591
ABL16 2703
And so as the other files the first column is same for all other files in the directory and second columns are different. 因此,与其他文件一样,目录中的所有其他文件的第一列相同,而第二列则不同。
I am looking for a output which looks something like this, 我正在寻找看起来像这样的输出,
id /pool1 /pool2 /pool3 /pool4 /pool5
ABL1 1352 1353 1354 1355 1356
ABL12 1236 1237 1238 1239 1240
ABL13 1022 1023 1024 1025 1026
ABL14 815 816 817 818 819
ABL15 1591 1592 1593 1594 1595
ABL16 2703 2704 2705 2706 2707
ABL17 1449 1450 1451 1452 1453
ABL18 619 620 621 622 623
ABL19 1074 1075 1076 1077 1078
So far I was trying to achieve it in python via following scripts, 到目前为止,我一直在尝试通过以下脚本在python中实现该目标,
path = '/Pool1'
files = os.listdir(path)
files_txt = [i for i in files if i.endswith('.txt_samplecount')]
files_merge= i for i in files_txt if i.merge(i,on="id")
But it throws error as
AttributeError: 'str' object has no attribute 'merge'
Any help or suggestions are welcome 欢迎任何帮助或建议
Thank you 谢谢
I found a solution , 我找到了解决方案,
path = '/Pool1'
files = os.listdir(path)
files_txt = [os.path.join(path,i) for i in files if i.endswith('.txt_samplecount')]
## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t') for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)
And this gives a output with each columns concatenate to the single file. 这样就给出了输出,每一列都连接到单个文件。 Thanks for suggestions all
谢谢所有的建议
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.