[英]Concatenating multiple CSV files based on column values,but the multiple csv files have the same header but vary in order
I'm cleaning some data.I have data from multiple subjects for multiple subjects over multiple trails.我正在清理一些数据。我有来自多个主题的多个主题的数据。
SubNo Trails Score
1 1 4
1 2 4
1 3 8
7 1 9
7 2 8
7 3 8
19
:
:
For the same subject, I have another dataset for indifferent order for SubNo对于同一主题,我有另一个数据集,用于 SubNo 的无差别顺序
SubNo Trails Height
19 1 100
19 2 400
19 3 810
7 1 911
7 2 811
7 3 811
20 1 222
20 2 222
20 3 789
1
1
:
:
I want to join these two on SubNo, such that in the end I have one CSV per subject for both score and height.我想在 SubNo 上加入这两个,这样最后我每个科目都有一个 CSV 得分和身高。
SubNo Trails Score Height
1 1 4 198
1 2 4 209
1 3 8 289
2 1 : :
2 2
2 3
Here, I have joined the same data based on subNo: So,all the values of 1 together,all values of subject 2 together and so on.In my two csv files the order of subject is not the same.在这里,我根据 subNo 加入了相同的数据:所以,1 的所有值一起,主题 2 的所有值一起等等。在我的两个 csv 文件中,主题的顺序是不一样的。 So,I don't want to join based on header,but based on specific subject number.In my case,that is 1,2,17,...like that.
所以,我不想基于 header 加入,而是基于特定的主题编号。在我的情况下,即 1、2、17,......就像那样。 How should I go about it?
我应该如何 go 关于它? (I tried pandas merge,it works based on header).That's won't do what I want.
(我尝试了 pandas 合并,它基于标题工作)。那不会做我想要的。
Your question is a bit unclear, but from what I undestand you are trying to get a single csv file which contains columns for SubNo, Trails, Score and Height (with SubNo column being the key)您的问题有点不清楚,但据我所知,您正在尝试获取单个 csv 文件,其中包含 SubNo、Trails、Score 和 Height 列(其中 SubNo 列是关键)
in that case you should do following:在这种情况下,您应该执行以下操作:
new_dataframe = left_dataframe.join(right_dataframe.set_index('SubNo'), on='SubNo', how='left')
or alternatively:或者:
new_dataframe = pd.merge(left_dataframe, right_dataframe, on='SubNo', how='left')
Please check out pandas merge
function.请查看 pandas
merge
function。 You will effectively be merging on SubNo
and Trials
.您将有效地合并
SubNo
和Trials
。 A small code snippet would be:一个小代码片段将是:
df1.merge(df2, how='inner')
After this step, you can perhaps slice by Subjects by using groupby()
function (can be found in pandas documentation), by grouping the rows based on your SubNO and then save each group as a separate CSV在此步骤之后,您可以使用
groupby()
function(可以在 pandas 文档中找到)通过根据您的 SubNO 对行进行分组,然后将每个组保存为单独的 ZCC8D68C55E074A9AFDZD53DE
Okey,so the solution I found was to sort each csv file on subnum and concatenate.好的,所以我找到的解决方案是对子编号上的每个 csv 文件进行排序并连接。
df1.sort_values(by=['Subnum','Trials'], ascending=True)
df2..sort_values(by=['Subnum','Trials'], ascending=True)
pd.concat([df1,df2],axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.