[英]Sort a pandas DataFrame by a column in another dataframe - pandas
Let's say I have a Pandas DataFrame with two columns, like:假设我有一个包含两列的 Pandas DataFrame,例如:
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [100, 200, 300, 400]})
print(df)
a b
0 1 100
1 2 200
2 3 300
3 4 400
And let's say I also have a Pandas Series, like:假设我还有一个 Pandas 系列,例如:
s = pd.Series([1, 3, 2, 4])
print(s)
0 1
1 3
2 2
3 4
dtype: int64
How can I sort the a
column to become the same order as the s
series, with the corresponding row values sorted together?如何将
a
列排序为与s
系列相同的顺序,并将相应的行值排序在一起?
My desired output would be:我想要的输出是:
a b
0 1 100
1 3 300
2 2 200
3 4 400
Is there any way to achieve this?有没有办法实现这一目标?
Please check self-answer below.请检查下面的自我回答。
What about:关于什么:
(
df.assign(s=s)
.sort_values(by='s')
.drop('s', axis=1)
)
I have ran into these issues quite often, so I just thought to share my solutions in Pandas.我经常遇到这些问题,所以我只是想在 Pandas 中分享我的解决方案。
Solution 1:解决方案1:
Using set_index
to convert the a
column to the index, then use reindex
to change the order, then use rename_axis
to change the index name back to a
, then use reset_index
to convert the a
column from an index back to a column:使用
set_index
将a
列转换为索引,然后使用reindex
更改顺序,然后使用rename_axis
将索引名称更改回a
,然后使用reset_index
将a
列从索引转换回列:
print(df.set_index('a').reindex(s).rename_axis('a').reset_index('a'))
Solution 2:解决方案2:
Using set_index
to convert the a
column to the index, then use loc
to change the order, then use reset_index
to convert the a
column from an index back to a column:使用
set_index
将a
列转换为索引,然后使用loc
更改顺序,然后使用reset_index
将a
列从索引转换回列:
print(df.set_index('a').loc[s].reset_index())
Solution 3:解决方案3:
Using iloc
to index the rows in a different order, then use map
to get that order that would fit the df
to make it get sorted with the s
series:使用
iloc
以不同的顺序索引行,然后使用map
获取适合df
顺序,使其与s
系列进行排序:
print(df.iloc[list(map(df['a'].tolist().index, s))])
Solution 4:解决方案4:
Using pd.DataFrame
to create a new DataFrame object, then use sorted
with a key
argument to sort the DataFrame by the s
series:使用
pd.DataFrame
创建一个新的 DataFrame 对象,然后使用sorted
with a key
参数按s
系列对 DataFrame 进行排序:
print(pd.DataFrame(sorted(df.values.tolist(), key=lambda x: s.tolist().index(x[0])), columns=df.columns))
Timing with the below code:使用以下代码计时:
import pandas as pd
from timeit import timeit
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [100, 200, 300, 400]})
s = pd.Series([1, 3, 2, 4])
def u10_1():
return df.set_index('a').reindex(s).rename_axis('a').reset_index('a')
def u10_2():
return df.set_index('a').loc[s].reset_index()
def u10_3():
return df.iloc[list(map(df['a'].tolist().index, s))]
def u10_4():
return pd.DataFrame(sorted(df.values.tolist(), key=lambda x: s.tolist().index(x[0])), columns=df.columns)
print('u10_1:', timeit(u10_1, number=1000))
print('u10_2:', timeit(u10_2, number=1000))
print('u10_3:', timeit(u10_3, number=1000))
print('u10_4:', timeit(u10_4, number=1000))
Output:输出:
u10_1: 3.012849470495621
u10_2: 3.072132612502147
u10_3: 0.7498072134665241
u10_4: 0.8109911930595484
@Allen has a pretty good answer too. @Allen 也有一个很好的答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.