So, I have extracted 2 dataframes as shown below:
DF1:
And I wish to apply a factor onto different parts of column pricedata 1 and pricedata2 for DF1 based on the conditional matching in another dataframe.
For instance, for row 0 in DF1, I hope to apply a factor onto pricedata1 value 100.5 by multiplying 2.5 which is derived from DF2 based on the condition where DF1 column year value == DF2 column name and DF1 column name == DF2 pricename column value. And then for year 2007 for pricedata1, to apply another factor of 5 instead.
I know about using df.apply for a entire column, I'm pretty lost on how to partially apply it to a column based on different if conditions
Thanks in advance
A concise solution is to reindex()
your df2
on df1
. First reshape df2
to match df1
(years as rows, price names as columns), then reindex()
and multiply the scaling factors element-wise.
Note: This relies on both indexes having the same dtype, so convert year.astype(...)
as needed.
df2 = df2.set_index('pricename').T.reindex(df1.year)
df1.pricedata1 = df1.pricedata1 * df2.pricedata1.values
df1.pricedata2 = df1.pricedata2 * df2.pricedata2.values
# date year pricedata1 pricedata2
# 2006-01-02 2006 251.25 169.5
# 2006-01-03 2006 251.25 169.5
# 2006-01-04 2006 251.25 169.5
# 2006-01-05 2006 251.25 169.5
# 2006-01-06 2006 251.25 169.5
# 2006-01-07 2006 251.25 169.5
# 2006-01-08 2006 251.25 169.5
# 2006-01-09 2006 251.25 169.5
# 2006-01-10 2006 251.25 169.5
# 2006-01-11 2006 251.25 169.5
# 2006-01-12 2006 251.25 169.5
# 2006-01-13 2006 251.25 169.5
# 2006-01-14 2006 251.25 169.5
# 2006-01-15 2006 251.25 169.5
# 2007-01-02 2007 502.50 339.0
# 2007-01-03 2007 502.50 339.0
# 2007-01-04 2007 502.50 339.0
# 2007-01-05 2007 502.50 339.0
You can do this by applying df1
by row:
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
A MWE
import sys
import pandas as pd
from io import StringIO
TESTDATA = StringIO("""year pricedata1 pricedata2
2016 100.5 56.5
2017 100.5 56.5
""")
df1 = pd.read_csv(TESTDATA, delim_whitespace=True)
TESTDATA = StringIO("""pricename 2016 2017
pricedata1 2.5 5
pricedata2 3.0 6
""")
df2 = pd.read_csv(TESTDATA, delim_whitespace=True)
df2 = df2.set_index('pricename')
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']] = df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
# print(df1)
year pricedata1 pricedata2
0 2016 251.25 169.5
1 2017 502.50 339.0
Here logic over which this code work is that:
Iterate along row in df1
and update the i-th row
with help of this df2.iloc[0].iloc[coln.index(j)]
,
where,
coln = list(df2.columns); columns of df2 and we will use it for future iteration match.
coln.index(j); give index of j where j
are years
.
Useful code is just this section. Remaining I used to make dataframe
from scratch:
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
import pandas as pd
days_2006 = pd.Series(
pd.date_range("2006-01-02", periods=14, freq="D")
)
days_2007 = pd.Series(
pd.date_range("2007-01-02", periods=4, freq="D")
)
days_total = pd.concat([days_2006, days_2007], ignore_index=True)
df1 = pd.DataFrame(
data= {
'date': days_total,
'year':days_total.dt.year,
'pricedata1': [100.5]*18,
'pricedata2': [56.5]*18
},
)
df2 = pd.DataFrame(
data={
'pricename':['pricedata1', 'pricedata2'],
2006:[2.5, 3.0],
2007:[5.0, 6.0]
}
)
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
Another concise way of doing using df.apply()
on df1
and df.set_index()
on df2
:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Original data, df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 100.5 56.5
1 2006-01-03 2006 100.5 56.5
2 2006-01-04 2006 100.5 56.5
3 2006-01-05 2006 100.5 56.5
4 2006-01-06 2006 100.5 56.5
5 2006-01-07 2006 100.5 56.5
6 2006-01-08 2006 100.5 56.5
7 2006-01-09 2006 100.5 56.5
8 2006-01-10 2006 100.5 56.5
9 2006-01-11 2006 100.5 56.5
10 2006-01-12 2006 100.5 56.5
11 2006-01-13 2006 100.5 56.5
12 2006-01-14 2006 100.5 56.5
13 2006-01-15 2006 100.5 56.5
14 2007-01-02 2007 100.5 56.5
15 2007-01-03 2007 100.5 56.5
16 2007-01-04 2007 100.5 56.5
17 2007-01-05 2007 100.5 56.5
# Original data, df2:
pricename 2006 2007
0 pricedata1 2.5 5
1 pricedata2 3.0 6
# Applying new codes:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Output df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 251.25 169.5
1 2006-01-03 2006 251.25 169.5
2 2006-01-04 2006 251.25 169.5
3 2006-01-05 2006 251.25 169.5
4 2006-01-06 2006 251.25 169.5
5 2006-01-07 2006 251.25 169.5
6 2006-01-08 2006 251.25 169.5
7 2006-01-09 2006 251.25 169.5
8 2006-01-10 2006 251.25 169.5
9 2006-01-11 2006 251.25 169.5
10 2006-01-12 2006 251.25 169.5
11 2006-01-13 2006 251.25 169.5
12 2006-01-14 2006 251.25 169.5
13 2006-01-15 2006 251.25 169.5
14 2007-01-02 2007 502.50 339.0
15 2007-01-03 2007 502.50 339.0
16 2007-01-04 2007 502.50 339.0
17 2007-01-05 2007 502.50 339.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.