简体   繁体   English

如何根据在另一个 dataframe 中定义的行/列缩放因子来缩放 pandas dataframe?

[英]How can I scale a pandas dataframe based on row/column scaling factors defined in another dataframe?

So, I have extracted 2 dataframes as shown below:因此,我提取了 2 个数据帧,如下所示:
DF1: DF1: 在此处输入图像描述

DF2: DF2: 在此处输入图像描述

And I wish to apply a factor onto different parts of column pricedata 1 and pricedata2 for DF1 based on the conditional matching in another dataframe.我希望根据另一个 dataframe 中的条件匹配,将一个因子应用于 DF1 列 pricedata 1 和 pricedata2 的不同部分。

For instance, for row 0 in DF1, I hope to apply a factor onto pricedata1 value 100.5 by multiplying 2.5 which is derived from DF2 based on the condition where DF1 column year value == DF2 column name and DF1 column name == DF2 pricename column value.例如,对于 DF1 中的第 0 行,我希望通过乘以 2.5 将一个因子应用于 pricedata1 值 100.5价值。 And then for year 2007 for pricedata1, to apply another factor of 5 instead.然后对于 pricedata1 的 2007 年,改为应用另一个因子 5。

I know about using df.apply for a entire column, I'm pretty lost on how to partially apply it to a column based on different if conditions我知道将 df.apply 用于整个列,对于如何根据不同的 if 条件将其部分应用于列,我很迷茫

Desired Output:所需的 Output: 在此处输入图像描述

Thanks in advance提前致谢

A concise solution is to reindex() your df2 on df1 .一个简洁的解决方案是在df1reindex()你的df2 First reshape df2 to match df1 (years as rows, price names as columns), then reindex() and multiply the scaling factors element-wise.首先重塑df2以匹配df1 (年作为行,价格名称作为列),然后reindex()并按元素乘以缩放因子。

Note: This relies on both indexes having the same dtype, so convert year.astype(...) as needed.注意:这依赖于具有相同 dtype 的两个索引,因此请根据需要转换year.astype(...)

df2 = df2.set_index('pricename').T.reindex(df1.year)

df1.pricedata1 = df1.pricedata1 * df2.pricedata1.values
df1.pricedata2 = df1.pricedata2 * df2.pricedata2.values

#       date  year  pricedata1  pricedata2
# 2006-01-02  2006      251.25       169.5
# 2006-01-03  2006      251.25       169.5
# 2006-01-04  2006      251.25       169.5
# 2006-01-05  2006      251.25       169.5
# 2006-01-06  2006      251.25       169.5
# 2006-01-07  2006      251.25       169.5
# 2006-01-08  2006      251.25       169.5
# 2006-01-09  2006      251.25       169.5
# 2006-01-10  2006      251.25       169.5
# 2006-01-11  2006      251.25       169.5
# 2006-01-12  2006      251.25       169.5
# 2006-01-13  2006      251.25       169.5
# 2006-01-14  2006      251.25       169.5
# 2006-01-15  2006      251.25       169.5
# 2007-01-02  2007      502.50       339.0
# 2007-01-03  2007      502.50       339.0
# 2007-01-04  2007      502.50       339.0
# 2007-01-05  2007      502.50       339.0

You can do this by applying df1 by row:您可以通过按行应用df1来做到这一点:

def multiply(row):
    year = df1['year'].loc[row.name]

    for pricedata in row.index:
        row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]

    return row

df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)

A MWE一个 MWE

import sys
import pandas as pd
from io import StringIO

TESTDATA = StringIO("""year pricedata1 pricedata2
2016 100.5 56.5
2017 100.5 56.5
    """)

df1 = pd.read_csv(TESTDATA, delim_whitespace=True)

TESTDATA = StringIO("""pricename 2016 2017
pricedata1 2.5 5
pricedata2 3.0 6
    """)

df2 = pd.read_csv(TESTDATA, delim_whitespace=True)

df2 = df2.set_index('pricename')

def multiply(row):
    year = df1['year'].loc[row.name]

    for pricedata in row.index:
        row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]

    return row

df1[['pricedata1', 'pricedata2']] = df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
# print(df1)

   year  pricedata1  pricedata2
0  2016      251.25       169.5
1  2017      502.50       339.0

Here logic over which this code work is that:这里代码工作的逻辑是:

Iterate along row in df1 and update the i-th row with help of this df2.iloc[0].iloc[coln.index(j)] ,沿着row in df1 df2.iloc[0].iloc[coln.index(j)]的帮助下更新第i-th row
where,在哪里,
coln = list(df2.columns); coln = 列表(df2.columns); columns of df2 and we will use it for future iteration match. df2 的列,我们将在未来的迭代匹配中使用它。
coln.index(j); coln.index(j); give index of j where j are years给出 j 的索引,其中j是年
. .

Useful code is just this section.有用的代码就是这一部分。 Remaining I used to make dataframe from scratch:剩下的我曾经从头开始制作dataframe

coln = list(df2.columns)

for i,j in zip(range(18),df1['year']):
    df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
    df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]

print(df1)

Complete Code:完整代码:

import pandas as pd

days_2006 = pd.Series(
    pd.date_range("2006-01-02", periods=14, freq="D")
)

days_2007 = pd.Series(
    pd.date_range("2007-01-02", periods=4, freq="D")
)

days_total = pd.concat([days_2006, days_2007], ignore_index=True)

df1 = pd.DataFrame(
    data= {
        'date': days_total,
        'year':days_total.dt.year,
        'pricedata1': [100.5]*18,
        'pricedata2': [56.5]*18
    },
)

df2 = pd.DataFrame(
    data={
        'pricename':['pricedata1', 'pricedata2'],
        2006:[2.5, 3.0],
        2007:[5.0, 6.0]
    }
)

coln = list(df2.columns)

for i,j in zip(range(18),df1['year']):
    df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
    df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]

print(df1)

Another concise way of doing using df.apply() on df1 and df.set_index() on df2 :df1上使用df.apply() df.set_index()df2上使用 df.set_index() 的另一种简洁方法:

df1['pricedata1'] = df1.apply(lambda x:  x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x:  x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)

Test Run测试运行

# Original data,   df1:

          date  year  pricedata1  pricedata2
0   2006-01-02  2006       100.5        56.5
1   2006-01-03  2006       100.5        56.5
2   2006-01-04  2006       100.5        56.5
3   2006-01-05  2006       100.5        56.5
4   2006-01-06  2006       100.5        56.5
5   2006-01-07  2006       100.5        56.5
6   2006-01-08  2006       100.5        56.5
7   2006-01-09  2006       100.5        56.5
8   2006-01-10  2006       100.5        56.5
9   2006-01-11  2006       100.5        56.5
10  2006-01-12  2006       100.5        56.5
11  2006-01-13  2006       100.5        56.5
12  2006-01-14  2006       100.5        56.5
13  2006-01-15  2006       100.5        56.5
14  2007-01-02  2007       100.5        56.5
15  2007-01-03  2007       100.5        56.5
16  2007-01-04  2007       100.5        56.5
17  2007-01-05  2007       100.5        56.5

# Original data,   df2:

    pricename  2006  2007
0  pricedata1   2.5     5
1  pricedata2   3.0     6

# Applying new codes:
df1['pricedata1'] = df1.apply(lambda x:  x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x:  x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)

# Output df1:

          date  year  pricedata1  pricedata2
0   2006-01-02  2006      251.25       169.5
1   2006-01-03  2006      251.25       169.5
2   2006-01-04  2006      251.25       169.5
3   2006-01-05  2006      251.25       169.5
4   2006-01-06  2006      251.25       169.5
5   2006-01-07  2006      251.25       169.5
6   2006-01-08  2006      251.25       169.5
7   2006-01-09  2006      251.25       169.5
8   2006-01-10  2006      251.25       169.5
9   2006-01-11  2006      251.25       169.5
10  2006-01-12  2006      251.25       169.5
11  2006-01-13  2006      251.25       169.5
12  2006-01-14  2006      251.25       169.5
13  2006-01-15  2006      251.25       169.5
14  2007-01-02  2007      502.50       339.0
15  2007-01-03  2007      502.50       339.0
16  2007-01-04  2007      502.50       339.0
17  2007-01-05  2007      502.50       339.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于另一列的 Pandas 数据框比例列 - Pandas dataframe scale column based on another column 如何根据另一个数据框的完整字符串列过滤 pandas dataframe 子字符串? - How can I filter a pandas dataframe of substrings based on another dataframe's column of full strings? Pandas DataFrame:为什么我不能通过行迭代基于另一列的值来更改一列的值? - Pandas DataFrame: Why I can't change the value of one column based on value of another through row iteration? 如何使用 Numpy 通过来自另一个 DataFrame 的因子来缩放 DataFrame 中的列 - how to scale columns in a DataFrame by factors from another DataFrame using Numpy 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 如何根据另一个 DataFrame 中的列更新 Pandas DataFrame 中的列 - How to update a column in pandas DataFrame based on column from another DataFrame 如何使用基于上一行和下一行的条件在 Pandas Dataframe 上创建新列? - How can I create a new column on a Pandas Dataframe with conditions based on previous and next row? Pandas,如何避免使用 iterrow(如何根据来自另一个数据帧的值将值分配给 dataframe 中的新列) - Pandas, how can I avoid the use of iterrow (how to assign values to a new column in a dataframe based on the values from another dataframe) 如何将 dataframe 中的每一列与另一个 dataframe pandas 的行相乘? - How to multiply each column in a dataframe with a row from another dataframe pandas? 熊猫如何将一个数据框的一列作为一行添加到另一数据框 - pandas how to add a column of one dataframe as a row into another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM