繁体   English   中英

从具有相同索引和列名称的2个独立数据框中提取值

[英]Extract values from 2 seprate dataframes with same index and column names

我有2个数据框。 df1:

SKU USER 1  USER 2  USER 3  USER 4  USER 5  USER 6  USER 7
1001    5   2   0   0   2   2   1
1002    4   2   2   1   0   1.5 2
1003    1   1   0   0   0   3   3
1004    0   3   0   2   1   0   7
1005    1   1   0   4   4   3.5 0
1006    1   3   4   5   1   3   3
1007    0   1   1   3   0   0   5
1008    2   3   1   0   0   2.333333    0
1009    0   0   0   3   3   0   0
1010    5   6   3   0   2   4   6

df2:

SKU USER 1  USER 2  USER 3  USER 4  USER 5  USER 6  USER 7
1001    7.398414    4.398414    2.398414    2.398414    4.398414    4.398414    3.398414
1002    6.321304    4.321304    4.321304    3.321304    2.321304    3.821304    4.321304
1003    3.535435    3.535435    2.535435    2.535435    2.535435    5.535435    5.535435
1004    2.865097    5.865097    2.865097    4.865097    3.865097    2.865097    9.865097
1005    3.152332    3.152332    2.152332    6.152332    6.152332    5.652332    2.152332
1006    2.816583    4.816583    5.816583    6.816583    2.816583    4.816583    4.816583
1007    2.378649    3.378649    3.378649    5.378649    2.378649    2.378649    7.378649
1008    4.431189    5.431189    3.431189    2.431189    2.431189    4.764522    2.431189
1009    2.196257    2.196257    2.196257    5.196257    5.196257    2.196257    2.196257
1010    7.148196    8.148196    5.148196    2.148196    4.148196    6.148196    8.148196

我要为每个USER-SKU组合打印实际(df1)和预测(df2),如下所示:

对于USER1 SKU 1001: ACTUAL = 5, PREDICTED = 7.398414

如何提取这些值”?

您的数据似乎转动 在这种情况下,如果先将数据取消旋转(融化)(sku, user, value)行的表中,然后合并这两个表以形成(sku, user, actual, predicted)行。

import pandas as pd

# Reset indexes for unpivoting. If you need the original DataFrames
# as is later on, don't pass inplace=True and store the return value as
# the new index free frame.
df1.reset_index(level=0, inplace=True)
df2.reset_index(level=0, inplace=True)

# unpivot dataframes
df1_melt = pd.melt(df1, id_vars=['SKU'], var_name='USER', value_name='ACTUAL')    
df2_melt = pd.melt(df2, id_vars=['SKU'], var_name='USER', value_name='PREDICTED')

# merge dataframes on SKU, USER
df_merged = df1_melt.merge(df2_melt, on=['SKU', 'USER'])

for row in df_merged.itertuples(index=False):
    sku, user, actual, predicted = row
    print('{user} SKU {sku}: ACTUAL = {actual}, PREDICTED = {predicted}'.format(
        user=user, sku=sku, actual=actual, predicted=predicted
    ))

如果您不想重命名列,我相信您可以使用循环和简单索引,如下所示:

cols = range(7)
for c in cols:
    column = "USER " + str(c + 1)
    rows = range(10)
    for r in rows:
        actual = df1.iloc[r,c]
        predict = df2.iloc[r,c]
        print str(column) + "SKU" + str(r + 1001) + ": ACTUAL= " + str(actual) + ", PREDICTED = " + str(predict)

希望这可以帮助 :)

我认为重命名df2的列,然后merge然后定义查找功能会更容易:

In [175]:    
df2.rename(columns=d_cols,inplace =True)
df2.columns

Out[175]:
Index(['SKU', 'PRED USER 1', 'PRED USER 2', 'PRED USER 3', 'PRED USER 4',
       'PRED USER 5', 'PRED USER 6', 'PRED USER 7'],
      dtype='object')

In [184]:
df3 = df1.merge(df2)
def lookup(sku):
    return 'USER1 SKU {:d}: ACTUAL = {:f}, PREDICTED = {:f}'.format(sku, df3.loc[df3['SKU'] == sku, 'USER 1'].values[0], df3.loc[df3['SKU']==sku,'PRED USER 1'].values[0])
df3['SKU'].apply(lookup).iloc[0]

Out[184]:
'USER1 SKU 1001: ACTUAL = 5.000000, PREDICTED = 7.398414'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM