[英]How can I scale a pandas dataframe based on row/column scaling factors defined in another dataframe?
一個簡潔的解決方案是在df1
上reindex()
你的df2
。 首先重塑df2
以匹配df1
(年作為行,價格名稱作為列),然后reindex()
並按元素乘以縮放因子。
注意:這依賴於具有相同 dtype 的兩個索引,因此請根據需要轉換year.astype(...)
。
df2 = df2.set_index('pricename').T.reindex(df1.year)
df1.pricedata1 = df1.pricedata1 * df2.pricedata1.values
df1.pricedata2 = df1.pricedata2 * df2.pricedata2.values
# date year pricedata1 pricedata2
# 2006-01-02 2006 251.25 169.5
# 2006-01-03 2006 251.25 169.5
# 2006-01-04 2006 251.25 169.5
# 2006-01-05 2006 251.25 169.5
# 2006-01-06 2006 251.25 169.5
# 2006-01-07 2006 251.25 169.5
# 2006-01-08 2006 251.25 169.5
# 2006-01-09 2006 251.25 169.5
# 2006-01-10 2006 251.25 169.5
# 2006-01-11 2006 251.25 169.5
# 2006-01-12 2006 251.25 169.5
# 2006-01-13 2006 251.25 169.5
# 2006-01-14 2006 251.25 169.5
# 2006-01-15 2006 251.25 169.5
# 2007-01-02 2007 502.50 339.0
# 2007-01-03 2007 502.50 339.0
# 2007-01-04 2007 502.50 339.0
# 2007-01-05 2007 502.50 339.0
您可以通過按行應用df1
來做到這一點:
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
一個 MWE
import sys
import pandas as pd
from io import StringIO
TESTDATA = StringIO("""year pricedata1 pricedata2
2016 100.5 56.5
2017 100.5 56.5
""")
df1 = pd.read_csv(TESTDATA, delim_whitespace=True)
TESTDATA = StringIO("""pricename 2016 2017
pricedata1 2.5 5
pricedata2 3.0 6
""")
df2 = pd.read_csv(TESTDATA, delim_whitespace=True)
df2 = df2.set_index('pricename')
def multiply(row):
year = df1['year'].loc[row.name]
for pricedata in row.index:
row[pricedata] = df2[str(year)].loc[pricedata] * row[pricedata]
return row
df1[['pricedata1', 'pricedata2']] = df1[['pricedata1', 'pricedata2']].apply(multiply, axis=1)
# print(df1)
year pricedata1 pricedata2
0 2016 251.25 169.5
1 2017 502.50 339.0
這里代碼工作的邏輯是:
沿着row in df1
df2.iloc[0].iloc[coln.index(j)]
的幫助下更新第i-th row
,
在哪里,
coln = 列表(df2.columns); df2 的列,我們將在未來的迭代匹配中使用它。
coln.index(j); 給出 j 的索引,其中j
是年
.
有用的代碼就是這一部分。 剩下的我曾經從頭開始制作dataframe
:
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
import pandas as pd
days_2006 = pd.Series(
pd.date_range("2006-01-02", periods=14, freq="D")
)
days_2007 = pd.Series(
pd.date_range("2007-01-02", periods=4, freq="D")
)
days_total = pd.concat([days_2006, days_2007], ignore_index=True)
df1 = pd.DataFrame(
data= {
'date': days_total,
'year':days_total.dt.year,
'pricedata1': [100.5]*18,
'pricedata2': [56.5]*18
},
)
df2 = pd.DataFrame(
data={
'pricename':['pricedata1', 'pricedata2'],
2006:[2.5, 3.0],
2007:[5.0, 6.0]
}
)
coln = list(df2.columns)
for i,j in zip(range(18),df1['year']):
df1['pricedata1'][i] = df1['pricedata1'][i]*df2.iloc[0].iloc[coln.index(j)]
df1['pricedata2'][i] = df1['pricedata2'][i]*df2.iloc[1].iloc[coln.index(j)]
print(df1)
在df1
上使用df.apply()
df.set_index()
在df2
上使用 df.set_index() 的另一種簡潔方法:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Original data, df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 100.5 56.5
1 2006-01-03 2006 100.5 56.5
2 2006-01-04 2006 100.5 56.5
3 2006-01-05 2006 100.5 56.5
4 2006-01-06 2006 100.5 56.5
5 2006-01-07 2006 100.5 56.5
6 2006-01-08 2006 100.5 56.5
7 2006-01-09 2006 100.5 56.5
8 2006-01-10 2006 100.5 56.5
9 2006-01-11 2006 100.5 56.5
10 2006-01-12 2006 100.5 56.5
11 2006-01-13 2006 100.5 56.5
12 2006-01-14 2006 100.5 56.5
13 2006-01-15 2006 100.5 56.5
14 2007-01-02 2007 100.5 56.5
15 2007-01-03 2007 100.5 56.5
16 2007-01-04 2007 100.5 56.5
17 2007-01-05 2007 100.5 56.5
# Original data, df2:
pricename 2006 2007
0 pricedata1 2.5 5
1 pricedata2 3.0 6
# Applying new codes:
df1['pricedata1'] = df1.apply(lambda x: x['pricedata1'] * df2.set_index('pricename').loc['pricedata1', str(x['year'])], axis=1)
df1['pricedata2'] = df1.apply(lambda x: x['pricedata2'] * df2.set_index('pricename').loc['pricedata2', str(x['year'])], axis=1)
# Output df1:
date year pricedata1 pricedata2
0 2006-01-02 2006 251.25 169.5
1 2006-01-03 2006 251.25 169.5
2 2006-01-04 2006 251.25 169.5
3 2006-01-05 2006 251.25 169.5
4 2006-01-06 2006 251.25 169.5
5 2006-01-07 2006 251.25 169.5
6 2006-01-08 2006 251.25 169.5
7 2006-01-09 2006 251.25 169.5
8 2006-01-10 2006 251.25 169.5
9 2006-01-11 2006 251.25 169.5
10 2006-01-12 2006 251.25 169.5
11 2006-01-13 2006 251.25 169.5
12 2006-01-14 2006 251.25 169.5
13 2006-01-15 2006 251.25 169.5
14 2007-01-02 2007 502.50 339.0
15 2007-01-03 2007 502.50 339.0
16 2007-01-04 2007 502.50 339.0
17 2007-01-05 2007 502.50 339.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.