简体   繁体   English

用可变列对Pandas数据框进行排序

[英]Sorting Pandas dataframe with variable columns

I have an arbitrary number of data frames (3 in this case). 我有任意数量的数据帧(在这种情况下为3)。 I am trying to pick out the trip with the highest speed between the starting destination (column A) and the final destination (column variable). 我正在尝试选择起点(列A)和终点(列变量)之间最快的旅行。 These trips need to be stored in a new dataframe. 这些行程需要存储在新的数据框中。

d= {'A':['London', 'London', 'London', 'London', 'Budapest'], 'B': 
['Beijing', 'Sydney', 'Warsaw', 'Budapest', 'Warsaw'],'Speed': 
[1000,2000,500,499,500]}
df = pd.DataFrame(data=d)

 d1= {'A':['London', 'London', 'London',  'Budapest'], 'B':['Rio', 'Rio', 
'Rio', 'Rio'],'C':['Beijing', 'Sydney', 'Budapest', 'Warsaw'],'Speed': 
[2000,1000,500,500]}
df1= pd.DataFrame(data=d1)

d2= {'A':['London', 'London', 'London', 'London'],'B':['Florence', 
'Florence', 'Florence', 'Florence'],'C':['Rio', 'Rio', 'Rio', 'Rio'], 'D': 
['Beijing', 'Sydney', 'Oslo', 'Warsaw'],'Speed':[500,500,500,1000]}
df2= pd.DataFrame(data=d2)

The desired output for this particular case would look like this: 这种情况下的期望输出如下所示:

   A        B          C        D     Speed
London     Rio       Beijing   NaN     2000
London     Sydney    NaN       NaN     2000
London     Florence  Rio       Warsaw  1000
London     Florence  Rio       Oslo     500
London     Rio       Budapest  NaN      500
Budapest   Warsaw    NaN       NaN      500

I started by appending the dataframes with: 我首先将数据帧附加为:

 df.append(df1).append(df2)

First join all DataFrames toghether and sort by column Speed . 首先将所有DataFrame合并在一起,并按Speed列排序。 Then filter by boolean mask with ffill for forward filling missing values with duplicated : 然后用布尔滤镜ffill正向充填失踪值duplicated

df = pd.concat([df, df1, df2]).sort_values('Speed', ascending=False)

df = df[~df.ffill(axis=1).duplicated(['A','D'])].reset_index(drop=True)    
print (df)
          A         B         C       D  Speed
0    London    Sydney       NaN     NaN   2000
1    London       Rio   Beijing     NaN   2000
2    London  Florence       Rio  Warsaw   1000
3  Budapest    Warsaw       NaN     NaN    500
4    London       Rio  Budapest     NaN    500
5    London  Florence       Rio    Oslo    500

You can sort the data frames by using values or index. 您可以使用值或索引对数据帧进行排序。 For example, if you want to sort by column B - you can write code as below: For single column 例如,如果要按B列排序-您可以编写如下代码:对于单列

`df.sort_values(by=['B'])`

Sort by multiple column 按多列排序

df.sort_values(by=['col1', 'col2'])

You can also sort by the index values. 您也可以按索引值排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM