Sort dataframe by another on one column - pandas

Question

Let's say i have to data-frames, as shown below:

df=pd.DataFrame({'a':[1,4,3,2],'b':[1,2,3,4]})
df2=pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[34,56,7,55]})

I would like to sort df data by the order df2 data on 'a' column, so the df.a column would be the order of df2.a and that which makes the whole data-frame that order.

Desired output:

(made it manually, and if there's any mistake with it, please tell me :D)

My own attempt:

df = df.set_index('a')
df = df.reindex(index=df2['a'])
df = df.reset_index()
print(df)

Works as expected!!!,

But when i have longer data-frames, like:

df=pd.DataFrame({'a':[1,4,3,2,3,4,5,3,5,6],'b':[1,2,3,4,5,5,5,6,6,7]})
df2=pd.DataFrame({'a':[1,2,3,4,3,4,5,6,4,5],'b':[1,2,4,3,4,5,6,7,4,3]})

It doesn't work ass expected.

Note: i don't only want a explanation of why but i also need a solution to do it for big data-frames

Answer 1

One possible solution is create helper columns in both DataFrame s, because duplicated values:

df['g'] = df.groupby('a').cumcount()
df2['g'] = df2.groupby('a').cumcount()

df = df.set_index(['a','g']).reindex(index=df2.set_index(['a','g']).index)
print(df)
       b
a g     
1 0  1.0
2 0  4.0
3 0  3.0
4 0  2.0
3 1  5.0
4 1  5.0
5 0  5.0
6 0  7.0
4 2  NaN
5 1  6.0

Or maybe need merge :

df3 = df.merge(df2[['a','g']], on=['a','g'])
print(df3)
   a  b  g
0  1  1  0
1  4  2  0
2  3  3  0
3  2  4  0
4  3  5  1
5  4  5  1
6  5  5  0
7  5  6  1
8  6  7  0

Sort dataframe by another on one column - pandas

Question

1 answers

solution1
2 ACCPTED 2018-12-05 09:11:42

Sort dataframe by another on one column - pandas

Question

1 answers

solution1 2 ACCPTED 2018-12-05 09:11:42

solution1
2 ACCPTED 2018-12-05 09:11:42