简体   繁体   中英

Divide dataframe column by column from another dataframe

I have two dataframes that look similar and I want to divide one column from df1 by a column from df2 .

Some sample data is below:

dict1 = {'category': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0},
 'Id': {0: 24108, 1: 24307, 2: 24307, 3: 24411, 4: 24411},
 'count': {0: 3, 1: 2, 2: 33, 3: 98, 4: 33}}

df1 = pd.DataFrame(dict1)

dict2 = {'Id': {0: 24108, 1: 24307, 2: 24411},
 'count': {0: 3, 1: 35, 2: 131}}

df2 = pd.DataFrame(dict2)

I am trying to create a new column in the first dataframe ( df1 ) called weights by dividing df1['count'] by df2['count'] . Except for the column category and count in both dfs, the values are identical in the other columns.

I have the following piece of code, but I cannot seem to understand where the error is:

df1['weights'] = (df1['count']
                      .div(df1.merge(df2, on = 'Id', how = 'left')
                           ['count'].to_numpy())
                      )

I get the following error when I run the code:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'count'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_354/1318629977.py in <module>
      1 complete['weights'] = (complete['count']
----> 2                       .div(complete.merge(totals, on = 'companyId', how = 'left')['count'].to_numpy())
      3                       )

Any ideas why this is happening?

Since you end up with count_x and count_y after your merge, you need to specify which one you want:

df1['weights'] = (df1['count'].div(df1.merge(df2, on = 'Id', how = 'left')['count_y'].to_numpy()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM