[英]Divide columns in df by another df value based on condition
I have a dataframe:我有一个数据框:
df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
'month': ['1','1','3','3','5'],
'pmonth': ['1', '1', '2', '5', '5'],
'duration': [30, 15, 20, 15, 30],
'pduration': ['10', '20', '30', '40', '50']})
I have to divide duration
and pduration
by value column of second dataframe where date and month of two df
match.我必须按第二个数据帧的值列划分
duration
和pduration
,其中两个df
日期和月份匹配。 The second df
is:第二个
df
是:
df = pd.DataFrame({'date': ['2013-04-01','2013-04-02','2013-04-03','2013-04-04', '2013-04-05'],
'month': ['1','1','3','3','5'],
'value': ['1', '1', '2', '5', '5'],
})
The second df
is grouped by date and month, so duplicate combination of date month won't be present in the second df
.第二个
df
按日期和月份分组,因此第二个df
不会出现日期月份的重复组合。
First is necessary check if same dtypes
of column date
and month
in both DataFrames
and if numeric for columns for divide:首先需要检查,如果同一
dtypes
列的date
和month
均DataFrames
如果数字为分列:
#convert to numeric
df1['pduration'] = df1['pduration'].astype(int)
df2['value'] = df2['value'].astype(int)
print (df1.dtypes)
date object
month object
pmonth object
duration int64
pduration int32
print (df2.dtypes)
date object
month object
value int32
dtype: object
Then merge
with left join and divide by DataFrame.div
然后与左连接
merge
并除以DataFrame.div
df = df1.merge(df2, on=['date', 'month'], how='left')
df[['duration_new','pduration_new']] = df[['duration','pduration']].div(df['value'], axis=0)
print (df)
date month pmonth duration pduration value duration_new \
0 2013-04-01 1 1 30 10 1.0 30.0
1 2013-04-01 1 1 15 20 1.0 15.0
2 2013-04-01 3 2 20 30 NaN NaN
3 2013-04-02 3 5 15 40 NaN NaN
4 2013-04-02 5 5 30 50 NaN NaN
pduration_new
0 10.0
1 20.0
2 NaN
3 NaN
4 NaN
For remove value
column use pop
:对于删除
value
列使用pop
:
df[['duration_new','pduration_new']] = (df[['duration','pduration']]
.div(df.pop('value'), axis=0))
print (df)
date month pmonth duration pduration duration_new pduration_new
0 2013-04-01 1 1 30 10 30.0 10.0
1 2013-04-01 1 1 15 20 15.0 20.0
2 2013-04-01 3 2 20 30 NaN NaN
3 2013-04-02 3 5 15 40 NaN NaN
4 2013-04-02 5 5 30 50 NaN NaN
You can merge the second df into the first df and then divide.您可以将第二个 df 合并到第一个 df 中,然后进行划分。
Consider the first df as df1
and second df as df2
将第一个 df 视为
df1
,将第二个 df 视为df2
df1 = df1.merge(df2, on=['date', 'month'], how='left').fillna(1)
df1
date month pmonth duration pduration value
0 2013-04-01 1 1 30 10 1
1 2013-04-01 1 1 15 20 1
2 2013-04-01 3 2 20 30 1
3 2013-04-02 3 5 15 40 1
4 2013-04-02 5 5 30 50 1
df1['duration'] = df1['duration'] / df1['value']
df1['pduration'] = df1['pduration'] / df1['value']
df1.drop('value', axis=1, inplace=True)
you can merge the two dataframes, where the date and month match the value column will be added to the first data frame.您可以合并两个数据框,其中日期和月份匹配的值列将被添加到第一个数据框。 If there is no match it will represented by NaN.
如果没有匹配项,它将由 NaN 表示。 You can then do division operation.
然后就可以进行除法运算了。 see code below
看下面的代码
Assuming your second dataframe is df2, then假设你的第二个数据帧是 df2,那么
df3 = df2.merge(df, how = 'right')
for col in ['duration','pduration']:
df3['new_'+col] = df3[col].astype(float)/df3['value'].astype(float)
df3
results in结果是
date month value pmonth duration pduration newduration newpduration
0 2013-04-01 1 1 1 30 10 30.0 10.0
1 2013-04-01 1 1 1 15 20 15.0 20.0
2 2013-04-01 3 NaN 2 20 30 NaN NaN
3 2013-04-02 3 NaN 5 15 40 NaN NaN
4 2013-04-02 5 NaN 5 30 50 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.