简体   繁体   English

根据列值连接多个 CSV 文件,但多个 csv 文件具有相同的 header 但顺序不同

[英]Concatenating multiple CSV files based on column values,but the multiple csv files have the same header but vary in order

I'm cleaning some data.I have data from multiple subjects for multiple subjects over multiple trails.我正在清理一些数据。我有来自多个主题的多个主题的数据。

SubNo Trails Score 
1       1      4
1       2      4
1       3      8
7       1      9
7       2      8
7       3      8
19
:
:

For the same subject, I have another dataset for indifferent order for SubNo对于同一主题,我有另一个数据集,用于 SubNo 的无差别顺序

SubNo Trails Height 
19      1      100
19      2      400
19      3      810
7       1      911
7       2      811
7       3      811
20      1      222
20      2      222
20      3      789
1
1
:
:

I want to join these two on SubNo, such that in the end I have one CSV per subject for both score and height.我想在 SubNo 上加入这两个,这样最后我每个科目都有一个 CSV 得分和身高。

SubNo Trails Score Height 
1        1     4     198
1        2     4     209
1        3     8     289
2        1     :      :
2        2
2        3

Here, I have joined the same data based on subNo: So,all the values of 1 together,all values of subject 2 together and so on.In my two csv files the order of subject is not the same.在这里,我根据 subNo 加入了相同的数据:所以,1 的所有值一起,主题 2 的所有值一起等等。在我的两个 csv 文件中,主题的顺序是不一样的。 So,I don't want to join based on header,but based on specific subject number.In my case,that is 1,2,17,...like that.所以,我不想基于 header 加入,而是基于特定的主题编号。在我的情况下,即 1、2、17,......就像那样。 How should I go about it?我应该如何 go 关于它? (I tried pandas merge,it works based on header).That's won't do what I want. (我尝试了 pandas 合并,它基于标题工作)。那不会做我想要的。

Your question is a bit unclear, but from what I undestand you are trying to get a single csv file which contains columns for SubNo, Trails, Score and Height (with SubNo column being the key)您的问题有点不清楚,但据我所知,您正在尝试获取单个 csv 文件,其中包含 SubNo、Trails、Score 和 Height 列(其中 SubNo 列是关键)

in that case you should do following:在这种情况下,您应该执行以下操作:

new_dataframe = left_dataframe.join(right_dataframe.set_index('SubNo'), on='SubNo', how='left')

or alternatively:或者:

new_dataframe = pd.merge(left_dataframe, right_dataframe, on='SubNo', how='left')

Please check out pandas merge function.请查看 pandas merge function。 You will effectively be merging on SubNo and Trials .您将有效地合并SubNoTrials A small code snippet would be:一个小代码片段将是:

df1.merge(df2, how='inner')

After this step, you can perhaps slice by Subjects by using groupby() function (can be found in pandas documentation), by grouping the rows based on your SubNO and then save each group as a separate CSV在此步骤之后,您可以使用groupby() function(可以在 pandas 文档中找到)通过根据您的 SubNO 对行进行分组,然后将每个组保存为单独的 ZCC8D68C55E074A9AFDZD53DE

Use merge使用合并

print (pd.merge(df1, df2, on=['SubNo','Trails'],  how='left'))

SubNo  Trails  Score  Height
1       1      4      100
1       2      4      200
7       1      9      300

Okey,so the solution I found was to sort each csv file on subnum and concatenate.好的,所以我找到的解决方案是对子编号上的每个 csv 文件进行排序并连接。

df1.sort_values(by=['Subnum','Trials'], ascending=True)
df2..sort_values(by=['Subnum','Trials'], ascending=True)
pd.concat([df1,df2],axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多个 csv 文件连接成具有相同标头的单个 csv - Python - Concatenating multiple csv files into a single csv with the same header - Python 将带有每列值的标题行添加到多个 CSV 文件 - Adding a header row with values for each column to multiple CSV files 在 Apache Beam 中连接多个 csv 文件 - Concatenating multiple csv files in Apache Beam 连接多个具有不同结构的 Large.CSV 文件 - Concatenating Multiple Large .CSV Files with Varying Structures 将多个 CSV 文件与相同的 Header 合并到不同的组文件中 - Combine multiple CSV files with Same Header into different group files 将 header 添加到多个 csv 文件 - Adding a header to multiple csv files 将多个 CSV 文件连接到一个 Dataframe 并输出到 Master CSV - Concatenating multiple CSV files into a Dataframe and outputing to Master CSV 如何根据列名将多个 csv 文件连接成一个文件,而无需在代码中键入每个列标题 - How to concatenate multiple csv files into one based on column names without having to type every column header in code 更新了多个 CSV 文件中的 CSV 值 - Updated CSV Values across Multiple CSV Files 如何使用python pandas连接csv文件组时删除重复的标题(多行) - How to remove the repeated header(multiple rows) while concatenating group of csv files using python pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM