[英]Sorting Pandas dataframe with variable columns
I have an arbitrary number of data frames (3 in this case). 我有任意数量的数据帧(在这种情况下为3)。 I am trying to pick out the trip with the highest speed between the starting destination (column A) and the final destination (column variable).
我正在尝试选择起点(列A)和终点(列变量)之间最快的旅行。 These trips need to be stored in a new dataframe.
这些行程需要存储在新的数据框中。
d= {'A':['London', 'London', 'London', 'London', 'Budapest'], 'B':
['Beijing', 'Sydney', 'Warsaw', 'Budapest', 'Warsaw'],'Speed':
[1000,2000,500,499,500]}
df = pd.DataFrame(data=d)
d1= {'A':['London', 'London', 'London', 'Budapest'], 'B':['Rio', 'Rio',
'Rio', 'Rio'],'C':['Beijing', 'Sydney', 'Budapest', 'Warsaw'],'Speed':
[2000,1000,500,500]}
df1= pd.DataFrame(data=d1)
d2= {'A':['London', 'London', 'London', 'London'],'B':['Florence',
'Florence', 'Florence', 'Florence'],'C':['Rio', 'Rio', 'Rio', 'Rio'], 'D':
['Beijing', 'Sydney', 'Oslo', 'Warsaw'],'Speed':[500,500,500,1000]}
df2= pd.DataFrame(data=d2)
The desired output for this particular case would look like this: 这种情况下的期望输出如下所示:
A B C D Speed
London Rio Beijing NaN 2000
London Sydney NaN NaN 2000
London Florence Rio Warsaw 1000
London Florence Rio Oslo 500
London Rio Budapest NaN 500
Budapest Warsaw NaN NaN 500
I started by appending the dataframes with: 我首先将数据帧附加为:
df.append(df1).append(df2)
First join all DataFrames toghether and sort by column Speed
. 首先将所有DataFrame合并在一起,并按
Speed
列排序。 Then filter by boolean mask with ffill
for forward filling missing values with duplicated
: 然后用布尔滤镜
ffill
正向充填失踪值duplicated
:
df = pd.concat([df, df1, df2]).sort_values('Speed', ascending=False)
df = df[~df.ffill(axis=1).duplicated(['A','D'])].reset_index(drop=True)
print (df)
A B C D Speed
0 London Sydney NaN NaN 2000
1 London Rio Beijing NaN 2000
2 London Florence Rio Warsaw 1000
3 Budapest Warsaw NaN NaN 500
4 London Rio Budapest NaN 500
5 London Florence Rio Oslo 500
You can sort the data frames by using values or index. 您可以使用值或索引对数据帧进行排序。 For example, if you want to sort by column B - you can write code as below: For single column
例如,如果要按B列排序-您可以编写如下代码:对于单列
`df.sort_values(by=['B'])`
Sort by multiple column 按多列排序
df.sort_values(by=['col1', 'col2'])
You can also sort by the index values. 您也可以按索引值排序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.