简体   繁体   中英

Sort dataframe by another on one column - pandas

Let's say i have to data-frames, as shown below:

df=pd.DataFrame({'a':[1,4,3,2],'b':[1,2,3,4]})
df2=pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[34,56,7,55]})

I would like to sort df data by the order df2 data on 'a' column, so the df.a column would be the order of df2.a and that which makes the whole data-frame that order.

Desired output:

   a  b
0  1  1
1  2  4
2  3  3
3  4  2

(made it manually, and if there's any mistake with it, please tell me :D)

My own attempt:

df = df.set_index('a')
df = df.reindex(index=df2['a'])
df = df.reset_index()
print(df)

Works as expected!!!,

But when i have longer data-frames, like:

df=pd.DataFrame({'a':[1,4,3,2,3,4,5,3,5,6],'b':[1,2,3,4,5,5,5,6,6,7]})
df2=pd.DataFrame({'a':[1,2,3,4,3,4,5,6,4,5],'b':[1,2,4,3,4,5,6,7,4,3]})

It doesn't work ass expected.

Note: i don't only want a explanation of why but i also need a solution to do it for big data-frames

One possible solution is create helper columns in both DataFrame s, because duplicated values:

df['g'] = df.groupby('a').cumcount()
df2['g'] = df2.groupby('a').cumcount()

df = df.set_index(['a','g']).reindex(index=df2.set_index(['a','g']).index)
print(df)
       b
a g     
1 0  1.0
2 0  4.0
3 0  3.0
4 0  2.0
3 1  5.0
4 1  5.0
5 0  5.0
6 0  7.0
4 2  NaN
5 1  6.0

Or maybe need merge :

df3 = df.merge(df2[['a','g']], on=['a','g'])
print(df3)
   a  b  g
0  1  1  0
1  4  2  0
2  3  3  0
3  2  4  0
4  3  5  1
5  4  5  1
6  5  5  0
7  5  6  1
8  6  7  0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM