简体   繁体   中英

Sorting Pandas dataframe with variable columns

I have an arbitrary number of data frames (3 in this case). I am trying to pick out the trip with the highest speed between the starting destination (column A) and the final destination (column variable). These trips need to be stored in a new dataframe.

d= {'A':['London', 'London', 'London', 'London', 'Budapest'], 'B': 
['Beijing', 'Sydney', 'Warsaw', 'Budapest', 'Warsaw'],'Speed': 
[1000,2000,500,499,500]}
df = pd.DataFrame(data=d)

 d1= {'A':['London', 'London', 'London',  'Budapest'], 'B':['Rio', 'Rio', 
'Rio', 'Rio'],'C':['Beijing', 'Sydney', 'Budapest', 'Warsaw'],'Speed': 
[2000,1000,500,500]}
df1= pd.DataFrame(data=d1)

d2= {'A':['London', 'London', 'London', 'London'],'B':['Florence', 
'Florence', 'Florence', 'Florence'],'C':['Rio', 'Rio', 'Rio', 'Rio'], 'D': 
['Beijing', 'Sydney', 'Oslo', 'Warsaw'],'Speed':[500,500,500,1000]}
df2= pd.DataFrame(data=d2)

The desired output for this particular case would look like this:

   A        B          C        D     Speed
London     Rio       Beijing   NaN     2000
London     Sydney    NaN       NaN     2000
London     Florence  Rio       Warsaw  1000
London     Florence  Rio       Oslo     500
London     Rio       Budapest  NaN      500
Budapest   Warsaw    NaN       NaN      500

I started by appending the dataframes with:

 df.append(df1).append(df2)

First join all DataFrames toghether and sort by column Speed . Then filter by boolean mask with ffill for forward filling missing values with duplicated :

df = pd.concat([df, df1, df2]).sort_values('Speed', ascending=False)

df = df[~df.ffill(axis=1).duplicated(['A','D'])].reset_index(drop=True)    
print (df)
          A         B         C       D  Speed
0    London    Sydney       NaN     NaN   2000
1    London       Rio   Beijing     NaN   2000
2    London  Florence       Rio  Warsaw   1000
3  Budapest    Warsaw       NaN     NaN    500
4    London       Rio  Budapest     NaN    500
5    London  Florence       Rio    Oslo    500

You can sort the data frames by using values or index. For example, if you want to sort by column B - you can write code as below: For single column

`df.sort_values(by=['B'])`

Sort by multiple column

df.sort_values(by=['col1', 'col2'])

You can also sort by the index values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM