Sorting Pandas dataframe with variable columns

Question

I have an arbitrary number of data frames (3 in this case). I am trying to pick out the trip with the highest speed between the starting destination (column A) and the final destination (column variable). These trips need to be stored in a new dataframe.

d= {'A':['London', 'London', 'London', 'London', 'Budapest'], 'B': 
['Beijing', 'Sydney', 'Warsaw', 'Budapest', 'Warsaw'],'Speed': 
[1000,2000,500,499,500]}
df = pd.DataFrame(data=d)

 d1= {'A':['London', 'London', 'London',  'Budapest'], 'B':['Rio', 'Rio', 
'Rio', 'Rio'],'C':['Beijing', 'Sydney', 'Budapest', 'Warsaw'],'Speed': 
[2000,1000,500,500]}
df1= pd.DataFrame(data=d1)

d2= {'A':['London', 'London', 'London', 'London'],'B':['Florence', 
'Florence', 'Florence', 'Florence'],'C':['Rio', 'Rio', 'Rio', 'Rio'], 'D': 
['Beijing', 'Sydney', 'Oslo', 'Warsaw'],'Speed':[500,500,500,1000]}
df2= pd.DataFrame(data=d2)

The desired output for this particular case would look like this:

   A        B          C        D     Speed
London     Rio       Beijing   NaN     2000
London     Sydney    NaN       NaN     2000
London     Florence  Rio       Warsaw  1000
London     Florence  Rio       Oslo     500
London     Rio       Budapest  NaN      500
Budapest   Warsaw    NaN       NaN      500

I started by appending the dataframes with:

 df.append(df1).append(df2)

Answer 1

First join all DataFrames toghether and sort by column Speed . Then filter by boolean mask with ffill for forward filling missing values with duplicated :

df = pd.concat([df, df1, df2]).sort_values('Speed', ascending=False)

df = df[~df.ffill(axis=1).duplicated(['A','D'])].reset_index(drop=True)    
print (df)
          A         B         C       D  Speed
0    London    Sydney       NaN     NaN   2000
1    London       Rio   Beijing     NaN   2000
2    London  Florence       Rio  Warsaw   1000
3  Budapest    Warsaw       NaN     NaN    500
4    London       Rio  Budapest     NaN    500
5    London  Florence       Rio    Oslo    500

Answer 2

You can sort the data frames by using values or index. For example, if you want to sort by column B - you can write code as below: For single column

`df.sort_values(by=['B'])`

Sort by multiple column

df.sort_values(by=['col1', 'col2'])

You can also sort by the index values.

Sorting Pandas dataframe with variable columns

Question

2 answers

solution1
3 ACCPTED 2019-01-24 10:31:13

solution2
0 2019-01-24 10:22:20

Sorting Pandas dataframe with variable columns

Question

2 answers

solution1 3 ACCPTED 2019-01-24 10:31:13

solution2 0 2019-01-24 10:22:20

solution1
3 ACCPTED 2019-01-24 10:31:13

solution2
0 2019-01-24 10:22:20