簡體   English   中英

如何從不同的熊貓數據框中選擇選定的列

[英]How to multiply selected columns from different pandas dataframes

我有3個pandas數據框(類似於下面的數據框)。 我有2個列表, list ID_1 = ['sdf', 'sdfsdf', ...]list ID_2 = ['kjdf', 'kldfjs', ...]

Table1:
    ID_1    ID_2    Value
0   PUFPaY9 NdYWqAJ 0.002
1   Iu6AxdB qANhGcw 0.01
2   auESFwW jUEUNdw 0.2345
3   LWbYpca G3uZ_Rg 0.0835
4   8fApIAM mVHrayg 0.0295

Table2:
     ID_1    weight1 weight2 .....weightN
0   PUFPaY9     
1   Iu6AxdB     
2   auESFwW 
3   LWbYpca     

Table3:
    ID_2    weight1 weight2 .....weightN
0   PUFPaY9     
1   Iu6AxdB     
2   auESFwW     
3   LWbYpca     

我想有一個應該計算的數據框,

for each x ID_1 in list1:
    for each y ID_2 in list2:
        if x-y exist in Table1:
            temp_row = ( x[weights[i]].* y[weights[i]])
            # here i want one to one multiplication, x[weight1]*y[weight1] , x[weight2]*y[weight2]
            temp_row.append(value[x-y] in Table1)
            new_dataframe.append(temp_row)

return new_dataframe

所需的new_dataframe應該類似於表4:

Table4:
        weight1 weight2 weight3 .....weightN value
    0           
    1           
    2       
    3       

我現在能做的是:

new_df = df[(df.ID_1.isin(list1)) & (df.ID_2.isin(list2))]使用此方法,我將獲得所有有效的ID_1ID_2組合和值。 但是我不知道如何從兩個數據幀中獲得權重的乘積(而不為每個weight[i]循環)?

現在任務變得更容易了,我可以遍歷new_dffor each row in new_df new_df for each row in new_df ,我weight[i to n] for ID_1 from table 2找到weight[i to n] for ID_1 from table 2 weight[i to n] for ID_2 from table3 然后,我可以追加其one-one multiplication"value" from table1FINAL_DF 但是我不想循環執行,我們可以使用更智能的方式解決此問題嗎?

那是你要的嗎?

data = """\
ID_1
PUFPaY9     
aaaaaaa
Iu6AxdB     
auESFwW 
LWbYpca
"""
id1 = pd.read_csv(io.StringIO(data), delim_whitespace=True)

data = """\
ID_2   
PUFPaY9
Iu6AxdB
xxxxxxx
auESFwW
LWbYpca
"""
id2 = pd.read_csv(io.StringIO(data), delim_whitespace=True)

cols = ['weight{}'.format(i) for i in range(1,5)]
for c in cols:
    id1[c] = np.random.randint(1, 10, len(id1))
    id2[c] = np.random.randint(1, 10, len(id2))

id1.set_index('ID_1', inplace=True)
id2.set_index('ID_2', inplace=True)

df_mul = id1 * id2

一步步:

In [215]: id1
Out[215]:
         weight1  weight2  weight3  weight4
ID_1
PUFPaY9        8        9        1        1
aaaaaaa        6        1        9        2
Iu6AxdB        8        4        8        5
auESFwW        9        3        4        2
LWbYpca        7        7        1        8

In [216]: id2
Out[216]:
         weight1  weight2  weight3  weight4
ID_2
PUFPaY9        6        5        5        1
Iu6AxdB        1        5        4        5
xxxxxxx        1        2        6        4
auESFwW        3        9        5        5
LWbYpca        3        3        6        7

In [217]: id1 * id2
Out[217]:
         weight1  weight2  weight3  weight4
Iu6AxdB      8.0     20.0     32.0     25.0
LWbYpca     21.0     21.0      6.0     56.0
PUFPaY9     48.0     45.0      5.0      1.0
aaaaaaa      NaN      NaN      NaN      NaN
auESFwW     27.0     27.0     20.0     10.0
xxxxxxx      NaN      NaN      NaN      NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM