简体   繁体   中英

pandas.DataFrame.join() is changing the order of my values when joining

I have 2 data frames that are the same number of rows. They originally occurred in different orders so I sorted one to make the orders match. I am now trying to join some of the columns of each into a new DF.

The column from my sorted table I want to join to the other is just a column of ints like this:

1
2
0
4

But when I do df2 = df2.join(df['Wins']) or df2 = df2.join(df['Wins'], sort=False) it does add the column to df2 but the order of values goes back to the order it was in before df was sorted so it does not properly match.

If you sorted using df = df.sort_values(col_name) , then it sorts the df, but the index values don't reset to this new order and since you don't specify a column to merge on, the default is index values which is why the values go back to the original order. To get the order that you want, you can

  1. use reset_index() after sorting but it will add an extra index column to your dfs, or
  2. add a merge column after you sort ( df['merge_col'] = list(range(df.shape[0])) ) so that they align.

See example below:

## Build data
df = pd.DataFrame({'col1':[4,3,2,1],'col2':[1,2,3,4]})

df2=pd.DataFrame({'col3':[5,6,7,8],'col4':[8,7,6,5]})
df2_sorted_on_col4 = df2.sort_values(by='col4')
df2_sorted_on_col4 # see the indices are reversed
#   col3  col4
#3     8     5
#2     7     6
#1     6     7
#0     5     8

regular_join = df.join(df2_sorted_on_col4)
regular_join # values in col3 and col4 went back to original
#   col1  col2  col3  col4
#0     4     1     5     8
#1     3     2     6     7
#2     2     3     7     6
#3     1     4     8     5

join_with_reset_index = df.join(df2_sorted_on_col4.reset_index())
join_with_reset_index # maintains the col3 and col4 ordering
#   col1  col2  index  col3  col4
#0     4     1      3     8     5
#1     3     2      2     7     6
#2     2     3      1     6     7
#3     1     4      0     5     8


df2_sorted_with_merge_col = df2_sorted_on_col4.copy()
df2_sorted_with_merge_col['merge_col'] = list(range(df2_sorted_with_merge_col.shape[0]))

joined_on_merge_col = df2_sorted_with_merge_col.join(df,on='merge_col')
joined_on_merge_col # keeps col3 and col4 rows in right order but reorders columns
#   col3  col4  merge_col  col1  col2
#3     8     5          0     4     1
#2     7     6          1     3     2
#1     6     7          2     2     3
#0     5     8          3     1     4

Pay attention to pandas docs on join which documents the sorting behaviour. It depends on the index as the other answer demonstrated on examples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM