[英]pandas how to divide to get ratio for different two dataframe
df1 是參考,df2 是目標。 df2 的 TYPE 列值應該被忽略,只是保持值不變.. 因為 df2 中的 TYPE 列,我不能直接划分。 如何比較兩個數據幀並從參考數據幀中獲得比率。 我必須保留 df2 數據框,並只獲取有關“總和”的比率值更新..
python
import numpy as np
import pandas as pd
df_data = {}
df_data['ID'] = [100001,100002,100003,100004]
df_data['ID2'] = ['A','B','C','D']
df_data['sum'] = [7,8,4,5]
df = pd.DataFrame(df_data)
print(df)
df_data2= {}
df_data2['ID'] = [100001,100002,100002,100003,100003,100001,100002]
df_data2['ID2'] = ['G','H','Q','J','H','A','B']
df_data2['TYPE'] = ['A','A','B','A','B','C','E']
df_data2['sum'] = [14,4,4,2,8,100,10]
df2 = pd.DataFrame(df_data2)
print(df2)
# my trying. I can get value but df2's dataframe is broken. I can't find value for TYPE column..
df.set_index(['ID','ID2'])['sum'] / df.set_index(['ID','ID2'])['sum']
#printout df
ID ID2 sum
0 100001 A 7
1 100002 B 8
2 100003 C 4
3 100004 D 5
#print df2
ID ID2 TYPE sum
0 100001 G A 14
1 100002 H A 4
2 100002 Q B 4
3 100003 J A 2
4 100003 H B 8
5 100001 A C 100
6 100002 B E 10
# my goal
ID ID2 TYPE sum
0 100001 G A N/A # There is no value ( ID:100001 ID2:G)
1 100002 H A N/A # There is no value ( ID:100002 ID2:H)
2 100002 Q B N/A # There is no value ( ID:100002 ID2:Q)
3 100003 J A N/A
4 100003 H B N/A
5 100001 A C 25.0 # There is value ( ID:100001 ID2:A)
6 100002 B E 2.0 # There is value ( ID:100002 ID2:B)
#my trying
ID ID2
100001 A 14.285714
G NaN
100002 B 1.250000
H NaN
Q NaN
100003 C NaN
H NaN
J NaN
100004 D NaN
這可以merge
:
df2['sum'] = (df2.merge(df, on=['ID','ID2'],
how='left')
.assign(sum=lambda x: x.sum_x/x.sum_y)
['sum']
)
輸出:
ID ID2 TYPE sum
0 100001 G A NaN
1 100002 H A NaN
2 100002 Q B NaN
3 100003 J A NaN
4 100003 H B NaN
5 100001 A C 14.285714
6 100002 B E 1.250000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.