简体   繁体   English

根据条件将 df 中的列除以另一个 df 值

[英]Divide columns in df by another df value based on condition

I have a dataframe:我有一个数据框:

df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
           'month': ['1','1','3','3','5'],
          'pmonth': ['1', '1', '2', '5', '5'],
          'duration': [30, 15, 20, 15, 30],
         'pduration': ['10', '20', '30', '40', '50']})

I have to divide duration and pduration by value column of second dataframe where date and month of two df match.我必须按第二个数据帧的值列划分durationpduration ,其中两个df日期和月份匹配。 The second df is:第二个df是:

 df = pd.DataFrame({'date': ['2013-04-01','2013-04-02','2013-04-03','2013-04-04', '2013-04-05'],
           'month': ['1','1','3','3','5'],
          'value': ['1', '1', '2', '5', '5'],
          })

The second df is grouped by date and month, so duplicate combination of date month won't be present in the second df .第二个df按日期和月份分组,因此第二个df不会出现日期月份的重复组合。

First is necessary check if same dtypes of column date and month in both DataFrames and if numeric for columns for divide:首先需要检查,如果同一dtypes列的datemonthDataFrames如果数字为分列:

#convert to numeric
df1['pduration'] = df1['pduration'].astype(int)
df2['value'] = df2['value'].astype(int)

print (df1.dtypes)
date         object
month        object
pmonth       object
duration      int64
pduration     int32

print (df2.dtypes)
date     object
month    object
value     int32
dtype: object

Then merge with left join and divide by DataFrame.div然后与左连接merge并除以DataFrame.div

df = df1.merge(df2, on=['date', 'month'], how='left')

df[['duration_new','pduration_new']] = df[['duration','pduration']].div(df['value'], axis=0)
print (df)
         date month pmonth  duration  pduration  value  duration_new  \
0  2013-04-01     1      1        30         10    1.0          30.0   
1  2013-04-01     1      1        15         20    1.0          15.0   
2  2013-04-01     3      2        20         30    NaN           NaN   
3  2013-04-02     3      5        15         40    NaN           NaN   
4  2013-04-02     5      5        30         50    NaN           NaN   

   pduration_new  
0           10.0  
1           20.0  
2            NaN  
3            NaN  
4            NaN  

For remove value column use pop :对于删除value列使用pop

df[['duration_new','pduration_new']] = (df[['duration','pduration']]
                                             .div(df.pop('value'), axis=0))
print (df)
         date month pmonth  duration  pduration  duration_new  pduration_new
0  2013-04-01     1      1        30         10          30.0           10.0
1  2013-04-01     1      1        15         20          15.0           20.0
2  2013-04-01     3      2        20         30           NaN            NaN
3  2013-04-02     3      5        15         40           NaN            NaN
4  2013-04-02     5      5        30         50           NaN            NaN

You can merge the second df into the first df and then divide.您可以将第二个 df 合并到第一个 df 中,然后进行划分。

Consider the first df as df1 and second df as df2将第一个 df 视为df1 ,将第二个 df 视为df2

df1 = df1.merge(df2, on=['date', 'month'], how='left').fillna(1)
df1
         date month pmonth  duration pduration value
0  2013-04-01     1      1        30        10     1
1  2013-04-01     1      1        15        20     1
2  2013-04-01     3      2        20        30     1
3  2013-04-02     3      5        15        40     1
4  2013-04-02     5      5        30        50     1

df1['duration'] = df1['duration'] / df1['value']
df1['pduration'] = df1['pduration'] / df1['value']
df1.drop('value', axis=1, inplace=True)

you can merge the two dataframes, where the date and month match the value column will be added to the first data frame.您可以合并两个数据框,其中日期和月份匹配的值列将被添加到第一个数据框。 If there is no match it will represented by NaN.如果没有匹配项,它将由 NaN 表示。 You can then do division operation.然后就可以进行除法运算了。 see code below看下面的代码

Assuming your second dataframe is df2, then假设你的第二个数据帧是 df2,那么

df3 = df2.merge(df, how = 'right')
for col in ['duration','pduration']:
    df3['new_'+col] = df3[col].astype(float)/df3['value'].astype(float)
df3

results in结果是

date    month   value   pmonth  duration    pduration   newduration newpduration
0   2013-04-01  1   1   1   30  10  30.0    10.0
1   2013-04-01  1   1   1   15  20  15.0    20.0
2   2013-04-01  3   NaN 2   20  30  NaN NaN
3   2013-04-02  3   NaN 5   15  40  NaN NaN
4   2013-04-02  5   NaN 5   30  50  NaN NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM