[英]How can I scale a pandas dataframe based on row/column scaling factors defined in another dataframe?
So, I have extracted 2 dataframes as shown below:因此,我提取了 2 个数据帧,如下所示:
DF1: DF1:
And I wish to apply a factor onto different parts of column pricedata 1 and pricedata2 for DF1 based on the conditional matching in another dataframe.我希望根据另一个 dataframe 中的条件匹配,将一个因子应用于 DF1 列 pricedata 1 和 pricedata2 的不同部分。
For instance, for row 0 in DF1, I hope to apply a factor onto pricedata1 value 100.5 by multiplying 2.5 which is derived from DF2 based on the condition where DF1 column year value == DF2 column name and DF1 column name == DF2 pricename column value.例如,对于 DF1 中的第 0 行,我希望通过乘以 2.5 将一个因子应用于 pricedata1 值 100.5价值。 And then for year 2007 for pricedata1, to apply another factor of 5 instead.然后对于 pricedata1 的 2007 年,改为应用另一个因子 5。
I know about using df.apply for a entire column, I'm pretty lost on how to partially apply it to a column based on different if conditions我知道将 df.apply 用于整个列,对于如何根据不同的 if 条件将其部分应用于列,我很迷茫
Thanks in advance提前致谢
A concise solution is to reindex()
your df2
on df1
.一个简洁的解决方案是在df1
上reindex()
你的df2
。 First reshape df2
to match df1
(years as rows, price names as columns), then reindex()
and multiply the scaling factors element-wise.首先重塑df2
以匹配df1
(年作为行,价格名称作为列),然后reindex()
并按元素乘以缩放因子。
Note: This relies on both indexes having the same dtype, so convert year.astype(...)
as needed.注意:这依赖于具有相同 dtype 的两个索引,因此请根据需要转换year.astype(...)
。
df2 = df2.set_index('pricename').T.reindex(df1.year)
df1.pricedata1 = df1.pricedata1 * df2.pricedata1.values
df1.pricedata2 = df1.pricedata2 * df2.pricedata2.values
# date year pricedata1 pricedata2
# 2006-01-02 2006 251.25 169.5
# 2006-01-03 2006 251.25 169.5
# 2006-01-04 2006 251.25 169.5
# 2006-01-05 2006 251.25 169.5
# 2006-01-06 2006 251.25 169.5
# 2006-01-07 2006 251.25 169.5
# 2006-01-08 2006 251.25 169.5
# 2006-01-09 2006 251.25 169.5
# 2006-01-10 2006 251.25 169.5
# 2006-01-11 2006 251.25 169.5
# 2006-01-12 2006 251.25 169.5
# 2006-01-13 2006 251.25 169.5
# 2006-01-14 2006 251.25 169.5
# 2006-01-15 2006 251.25 169.5
# 2007-01-02 2007 502.50 339.0
# 2007-01-03 2007 502.50 339.0
# 2007-01-04 2007 502.50 339.0
# 2007-01-05 2007 502.50 339.0
You can do this by applying df1
by row:您可以通过按行应用df1
来做到这一点:
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
A MWE一个 MWE
import sys
import pandas as pd
from io import StringIO
TESTDATA = StringIO("""year pricedata1 pricedata2
2016 100.5 56.5
2017 100.5 56.5
""")
df1 = pd.read_csv(TESTDATA, delim_whitespace=True)
TESTDATA = StringIO("""pricename 2016 2017
pricedata1 2.5 5
pricedata2 3.0 6
""")
df2 = pd.read_csv(TESTDATA, delim_whitespace=True)
df2 = df2.set_index('pricename')
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']] = df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
# print(df1)
year pricedata1 pricedata2
0 2016 251.25 169.5
1 2017 502.50 339.0
Here logic over which this code work is that:这里代码工作的逻辑是:
Iterate along row in df1
and update the i-th row
with help of this df2.iloc[0].iloc[coln.index(j)]
,沿着row in df1
df2.iloc[0].iloc[coln.index(j)]
的帮助下更新第i-th row
,
where,在哪里,
coln = list(df2.columns); coln = 列表(df2.columns); columns of df2 and we will use it for future iteration match. df2 的列,我们将在未来的迭代匹配中使用它。
coln.index(j); coln.index(j); give index of j where j
are years给出 j 的索引,其中j
是年
. .
Useful code is just this section.有用的代码就是这一部分。 Remaining I used to make dataframe
from scratch:剩下的我曾经从头开始制作dataframe
:
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
import pandas as pd
days_2006 = pd.Series(
pd.date_range("2006-01-02", periods=14, freq="D")
)
days_2007 = pd.Series(
pd.date_range("2007-01-02", periods=4, freq="D")
)
days_total = pd.concat([days_2006, days_2007], ignore_index=True)
df1 = pd.DataFrame(
data= {
'date': days_total,
'year':days_total.dt.year,
'pricedata1': [100.5]*18,
'pricedata2': [56.5]*18
},
)
df2 = pd.DataFrame(
data={
'pricename':['pricedata1', 'pricedata2'],
2006:[2.5, 3.0],
2007:[5.0, 6.0]
}
)
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
Another concise way of doing using df.apply()
on df1
and df.set_index()
on df2
:在df1
上使用df.apply()
df.set_index()
在df2
上使用 df.set_index() 的另一种简洁方法:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Original data, df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 100.5 56.5
1 2006-01-03 2006 100.5 56.5
2 2006-01-04 2006 100.5 56.5
3 2006-01-05 2006 100.5 56.5
4 2006-01-06 2006 100.5 56.5
5 2006-01-07 2006 100.5 56.5
6 2006-01-08 2006 100.5 56.5
7 2006-01-09 2006 100.5 56.5
8 2006-01-10 2006 100.5 56.5
9 2006-01-11 2006 100.5 56.5
10 2006-01-12 2006 100.5 56.5
11 2006-01-13 2006 100.5 56.5
12 2006-01-14 2006 100.5 56.5
13 2006-01-15 2006 100.5 56.5
14 2007-01-02 2007 100.5 56.5
15 2007-01-03 2007 100.5 56.5
16 2007-01-04 2007 100.5 56.5
17 2007-01-05 2007 100.5 56.5
# Original data, df2:
pricename 2006 2007
0 pricedata1 2.5 5
1 pricedata2 3.0 6
# Applying new codes:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Output df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 251.25 169.5
1 2006-01-03 2006 251.25 169.5
2 2006-01-04 2006 251.25 169.5
3 2006-01-05 2006 251.25 169.5
4 2006-01-06 2006 251.25 169.5
5 2006-01-07 2006 251.25 169.5
6 2006-01-08 2006 251.25 169.5
7 2006-01-09 2006 251.25 169.5
8 2006-01-10 2006 251.25 169.5
9 2006-01-11 2006 251.25 169.5
10 2006-01-12 2006 251.25 169.5
11 2006-01-13 2006 251.25 169.5
12 2006-01-14 2006 251.25 169.5
13 2006-01-15 2006 251.25 169.5
14 2007-01-02 2007 502.50 339.0
15 2007-01-03 2007 502.50 339.0
16 2007-01-04 2007 502.50 339.0
17 2007-01-05 2007 502.50 339.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.