简体   繁体   English

在一列上按另一列对数据框进行排序-Pandas

[英]Sort dataframe by another on one column - pandas

Let's say i have to data-frames, as shown below: 假设我必须进行数据帧处理,如下所示:

df=pd.DataFrame({'a':[1,4,3,2],'b':[1,2,3,4]})
df2=pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[34,56,7,55]})

I would like to sort df data by the order df2 data on 'a' column, so the df.a column would be the order of df2.a and that which makes the whole data-frame that order. 我想排序df由顺序数据df2的数据'a'列,所以df.a柱将顺序df2.a并且这使得整个数据帧的顺序。

Desired output: 所需的输出:

   a  b
0  1  1
1  2  4
2  3  3
3  4  2

(made it manually, and if there's any mistake with it, please tell me :D) (手动制作,如果有任何错误,请告诉我:D)

My own attempt: 我自己的尝试:

df = df.set_index('a')
df = df.reindex(index=df2['a'])
df = df.reset_index()
print(df)

Works as expected!!!, 如预期般运作!!!

But when i have longer data-frames, like: 但是,当我有更长的数据帧时,例如:

df=pd.DataFrame({'a':[1,4,3,2,3,4,5,3,5,6],'b':[1,2,3,4,5,5,5,6,6,7]})
df2=pd.DataFrame({'a':[1,2,3,4,3,4,5,6,4,5],'b':[1,2,4,3,4,5,6,7,4,3]})

It doesn't work ass expected. 它不能正常工作。

Note: i don't only want a explanation of why but i also need a solution to do it for big data-frames 注意:我不仅要解释原因,还需要针对大数据帧的解决方案

One possible solution is create helper columns in both DataFrame s, because duplicated values: 一种可能的解决方案是在两个DataFrame中都创建辅助列,因为值重复:

df['g'] = df.groupby('a').cumcount()
df2['g'] = df2.groupby('a').cumcount()

df = df.set_index(['a','g']).reindex(index=df2.set_index(['a','g']).index)
print(df)
       b
a g     
1 0  1.0
2 0  4.0
3 0  3.0
4 0  2.0
3 1  5.0
4 1  5.0
5 0  5.0
6 0  7.0
4 2  NaN
5 1  6.0

Or maybe need merge : 或者也许需要merge

df3 = df.merge(df2[['a','g']], on=['a','g'])
print(df3)
   a  b  g
0  1  1  0
1  4  2  0
2  3  3  0
3  2  4  0
4  3  5  1
5  4  5  1
6  5  5  0
7  5  6  1
8  6  7  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM