如何根据在另一个 dataframe 中定义的行/列缩放因子来缩放 pandas dataframe？

Question

So, I have extracted 2 dataframes as shown below:因此，我提取了 2 个数据帧，如下所示：
DF1: DF1：

DF2: DF2：

And I wish to apply a factor onto different parts of column pricedata 1 and pricedata2 for DF1 based on the conditional matching in another dataframe.我希望根据另一个 dataframe 中的条件匹配，将一个因子应用于 DF1 列 pricedata 1 和 pricedata2 的不同部分。

For instance, for row 0 in DF1, I hope to apply a factor onto pricedata1 value 100.5 by multiplying 2.5 which is derived from DF2 based on the condition where DF1 column year value == DF2 column name and DF1 column name == DF2 pricename column value.例如，对于 DF1 中的第 0 行，我希望通过乘以 2.5 将一个因子应用于 pricedata1 值 100.5价值。 And then for year 2007 for pricedata1, to apply another factor of 5 instead.然后对于 pricedata1 的 2007 年，改为应用另一个因子 5。

I know about using df.apply for a entire column, I'm pretty lost on how to partially apply it to a column based on different if conditions我知道将 df.apply 用于整个列，对于如何根据不同的 if 条件将其部分应用于列，我很迷茫

Desired Output:所需的 Output：

Thanks in advance提前致谢

Answer 1

A concise solution is to reindex() your df2 on df1 .一个简洁的解决方案是在df1上reindex()你的df2 。 First reshape df2 to match df1 (years as rows, price names as columns), then reindex() and multiply the scaling factors element-wise.首先重塑df2以匹配df1 （年作为行，价格名称作为列），然后reindex()并按元素乘以缩放因子。

Note: This relies on both indexes having the same dtype, so convert year.astype(...) as needed.注意：这依赖于具有相同 dtype 的两个索引，因此请根据需要转换year.astype(...) 。

df2 = df2.set_index('pricename').T.reindex(df1.year)

df1.pricedata1 = df1.pricedata1 * df2.pricedata1.values
df1.pricedata2 = df1.pricedata2 * df2.pricedata2.values

#       date  year  pricedata1  pricedata2
# 2006-01-02  2006      251.25       169.5
# 2006-01-03  2006      251.25       169.5
# 2006-01-04  2006      251.25       169.5
# 2006-01-05  2006      251.25       169.5
# 2006-01-06  2006      251.25       169.5
# 2006-01-07  2006      251.25       169.5
# 2006-01-08  2006      251.25       169.5
# 2006-01-09  2006      251.25       169.5
# 2006-01-10  2006      251.25       169.5
# 2006-01-11  2006      251.25       169.5
# 2006-01-12  2006      251.25       169.5
# 2006-01-13  2006      251.25       169.5
# 2006-01-14  2006      251.25       169.5
# 2006-01-15  2006      251.25       169.5
# 2007-01-02  2007      502.50       339.0
# 2007-01-03  2007      502.50       339.0
# 2007-01-04  2007      502.50       339.0
# 2007-01-05  2007      502.50       339.0

Answer 2

You can do this by applying df1 by row:您可以通过按行应用df1来做到这一点：

def multiply(row):
    year = df1['year'].loc[row.name]

    for pricedata in row.index:
        row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]

    return row

df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)

A MWE一个 MWE

import sys
import pandas as pd
from io import StringIO

TESTDATA = StringIO("""year pricedata1 pricedata2
2016 100.5 56.5
2017 100.5 56.5
    """)

df1 = pd.read_csv(TESTDATA, delim_whitespace=True)

TESTDATA = StringIO("""pricename 2016 2017
pricedata1 2.5 5
pricedata2 3.0 6
    """)

df2 = pd.read_csv(TESTDATA, delim_whitespace=True)

df2 = df2.set_index('pricename')

def multiply(row):
    year = df1['year'].loc[row.name]

    for pricedata in row.index:
        row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]

    return row

df1[['pricedata1', 'pricedata2']] = df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)

# print(df1)

   year  pricedata1  pricedata2
0  2016      251.25       169.5
1  2017      502.50       339.0

Answer 3

Here logic over which this code work is that:这里代码工作的逻辑是：

Iterate along row in df1 and update the i-th row with help of this df2.iloc[0].iloc[coln.index(j)] ,沿着row in df1 df2.iloc[0].iloc[coln.index(j)]的帮助下更新第i-th row ，
where,在哪里，
coln = list(df2.columns); coln = 列表（df2.columns）； columns of df2 and we will use it for future iteration match. df2 的列，我们将在未来的迭代匹配中使用它。
coln.index(j); coln.index(j); give index of j where j are years给出 j 的索引，其中j是年
. .

Useful code is just this section.有用的代码就是这一部分。 Remaining I used to make dataframe from scratch:剩下的我曾经从头开始制作dataframe ：

coln = list(df2.columns)

for i,j in zip(range(18),df1['year']):
    df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
    df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]

print(df1)

Complete Code:完整代码：

import pandas as pd

days_2006 = pd.Series(
    pd.date_range("2006-01-02", periods=14, freq="D")
)

days_2007 = pd.Series(
    pd.date_range("2007-01-02", periods=4, freq="D")
)

days_total = pd.concat([days_2006, days_2007], ignore_index=True)

df1 = pd.DataFrame(
    data= {
        'date': days_total,
        'year':days_total.dt.year,
        'pricedata1': [100.5]*18,
        'pricedata2': [56.5]*18
    },
)

df2 = pd.DataFrame(
    data={
        'pricename':['pricedata1', 'pricedata2'],
        2006:[2.5, 3.0],
        2007:[5.0, 6.0]
    }
)

coln = list(df2.columns)

for i,j in zip(range(18),df1['year']):
    df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
    df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]

print(df1)

Answer 4

Another concise way of doing using df.apply() on df1 and df.set_index() on df2 :在df1上使用df.apply() df.set_index()在df2上使用 df.set_index() 的另一种简洁方法：

df1['pricedata1'] = df1.apply(lambda x:  x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x:  x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)

Test Run测试运行

# Original data,   df1:

          date  year  pricedata1  pricedata2
0   2006-01-02  2006       100.5        56.5
1   2006-01-03  2006       100.5        56.5
2   2006-01-04  2006       100.5        56.5
3   2006-01-05  2006       100.5        56.5
4   2006-01-06  2006       100.5        56.5
5   2006-01-07  2006       100.5        56.5
6   2006-01-08  2006       100.5        56.5
7   2006-01-09  2006       100.5        56.5
8   2006-01-10  2006       100.5        56.5
9   2006-01-11  2006       100.5        56.5
10  2006-01-12  2006       100.5        56.5
11  2006-01-13  2006       100.5        56.5
12  2006-01-14  2006       100.5        56.5
13  2006-01-15  2006       100.5        56.5
14  2007-01-02  2007       100.5        56.5
15  2007-01-03  2007       100.5        56.5
16  2007-01-04  2007       100.5        56.5
17  2007-01-05  2007       100.5        56.5

# Original data,   df2:

    pricename  2006  2007
0  pricedata1   2.5     5
1  pricedata2   3.0     6

# Applying new codes:
df1['pricedata1'] = df1.apply(lambda x:  x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x:  x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)

# Output df1:

          date  year  pricedata1  pricedata2
0   2006-01-02  2006      251.25       169.5
1   2006-01-03  2006      251.25       169.5
2   2006-01-04  2006      251.25       169.5
3   2006-01-05  2006      251.25       169.5
4   2006-01-06  2006      251.25       169.5
5   2006-01-07  2006      251.25       169.5
6   2006-01-08  2006      251.25       169.5
7   2006-01-09  2006      251.25       169.5
8   2006-01-10  2006      251.25       169.5
9   2006-01-11  2006      251.25       169.5
10  2006-01-12  2006      251.25       169.5
11  2006-01-13  2006      251.25       169.5
12  2006-01-14  2006      251.25       169.5
13  2006-01-15  2006      251.25       169.5
14  2007-01-02  2007      502.50       339.0
15  2007-01-03  2007      502.50       339.0
16  2007-01-04  2007      502.50       339.0
17  2007-01-05  2007      502.50       339.0

如何根据在另一个 dataframe 中定义的行/列缩放因子来缩放 pandas dataframe？

问题描述

4 个解决方案

解决方案1
1 已采纳 2021-03-28 04:38:37

解决方案2
1 2021-03-28 04:38:45

解决方案3
1 2021-03-28 04:52:17

Complete Code:完整代码：

解决方案4
0 2021-03-28 18:13:14

Test Run测试运行

如何根据在另一个 dataframe 中定义的行/列缩放因子来缩放 pandas dataframe？

问题描述

4 个解决方案

解决方案1 1 已采纳 2021-03-28 04:38:37

解决方案2 1 2021-03-28 04:38:45

解决方案3 1 2021-03-28 04:52:17

Complete Code:完整代码：

解决方案4 0 2021-03-28 18:13:14

Test Run测试运行

解决方案1
1 已采纳 2021-03-28 04:38:37

解决方案2
1 2021-03-28 04:38:45

解决方案3
1 2021-03-28 04:52:17

解决方案4
0 2021-03-28 18:13:14