简体   繁体   中英

Concatenating multiple CSV files based on column values,but the multiple csv files have the same header but vary in order

I'm cleaning some data.I have data from multiple subjects for multiple subjects over multiple trails.

SubNo Trails Score 
1       1      4
1       2      4
1       3      8
7       1      9
7       2      8
7       3      8
19
:
:

For the same subject, I have another dataset for indifferent order for SubNo

SubNo Trails Height 
19      1      100
19      2      400
19      3      810
7       1      911
7       2      811
7       3      811
20      1      222
20      2      222
20      3      789
1
1
:
:

I want to join these two on SubNo, such that in the end I have one CSV per subject for both score and height.

SubNo Trails Score Height 
1        1     4     198
1        2     4     209
1        3     8     289
2        1     :      :
2        2
2        3

Here, I have joined the same data based on subNo: So,all the values of 1 together,all values of subject 2 together and so on.In my two csv files the order of subject is not the same. So,I don't want to join based on header,but based on specific subject number.In my case,that is 1,2,17,...like that. How should I go about it? (I tried pandas merge,it works based on header).That's won't do what I want.

Your question is a bit unclear, but from what I undestand you are trying to get a single csv file which contains columns for SubNo, Trails, Score and Height (with SubNo column being the key)

in that case you should do following:

new_dataframe = left_dataframe.join(right_dataframe.set_index('SubNo'), on='SubNo', how='left')

or alternatively:

new_dataframe = pd.merge(left_dataframe, right_dataframe, on='SubNo', how='left')

Please check out pandas merge function. You will effectively be merging on SubNo and Trials . A small code snippet would be:

df1.merge(df2, how='inner')

After this step, you can perhaps slice by Subjects by using groupby() function (can be found in pandas documentation), by grouping the rows based on your SubNO and then save each group as a separate CSV

Use merge

print (pd.merge(df1, df2, on=['SubNo','Trails'],  how='left'))

SubNo  Trails  Score  Height
1       1      4      100
1       2      4      200
7       1      9      300

Okey,so the solution I found was to sort each csv file on subnum and concatenate.

df1.sort_values(by=['Subnum','Trials'], ascending=True)
df2..sort_values(by=['Subnum','Trials'], ascending=True)
pd.concat([df1,df2],axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM