如何根據python中的另一個數據框選擇前k行？

Question

我有如下數據。 用戶是1001到1004（但實際數據有100萬用戶）。 每個用戶對於變量 AT1 到 AT6 都有相應的概率。

user   AT1     AT2     AT3     AT4    AT5    AT6 
 1001  0.004   0.003   0.03    0.01   0.5    0.453
 1002  0.2     0.1     0.3     0.1    0.1    0.2    
 1003  0.07    0.13    0.22    0.3    0.08   0.2 
 1004  0.01    0.23    0.43    0.15   0.04   0.14

我想根據以下數據為每個選項選擇前 3 個用戶。

client   choice_1 choice_2
997       AT2    AT3
223       AT6    AT5
444       AT1    AT4
121       AT1    AT5

在輸出中，top1 到 top3 是基於 choice_1 概率的前 3 個用戶，而 top4 到 top6 是choice_2 的概率。 客戶端 id 不是計算出來的，而是給出的。 topN 也沒有計算出來，而是作為每個選擇的前 3 名給出。 輸出應如下所示：

client top1   top2    top3  top4    top5    top6   
997    1004   1003    1002   1004   1002     1003     
223    1001   1002    1003   1001   1002     1003
444    1002   1003    1004   1004   1003     1002 
121    1002   1003    1004   1001   1002     1003

如何在python中構造最后一個數據框？

Answer 1

我不知道這將如何擴展到一百萬行，但是請使用這個字典理解：

# Set up test df's and re-index.
df_user = pd.DataFrame({
    "user":[1001,1002,1003,1004],
    "AT2" :[0.003, 0.1, 0.13, 0.23],
    "AT3" :[0.03, 0.3, 0.22, 0.43],
    "AT5" :[0.5, 0.1, 0.08, 0.04],
    "AT6" :[0.453,0.2,0.2,0.14]
})
df_user.set_index("user", inplace=True)
df_client = pd.DataFrame({
    "client":[997, 223],
    "choice_1":["AT2","AT6"],
    "choice_2":["AT3", "AT5"]
})

# dictionary comprehension
pd.DataFrame({row["client"]:np.append(df_user[row["choice_1"]].nlargest(3).index.values,
                                      df_user[row["choice_2"]].nlargest(3).index.values)
              for (i, row) in df_client.iterrows()}).T

輸出（顯然，您仍然必須重命名列）：

簡短說明：運行以下代碼以查看df.iterrows()中的可迭代df.iterrows()是 (a) 數據df.iterrows()索引 (b) 列的元組。

for it in df_client.iterrows():
    print(it)

運行最后一個片段后， it包含df_client的最后一行，因此設置row = it[1]以試驗您可以從中提取的各種信息位。 特別是， row["choice_1"]給你類似"AT1"東西，你可以從中提取相應的列df_user ，你可以使用pandas nlargest函數。 一旦你把所有的部分放在一起，字典理解就會變得很簡單。

如何根據python中的另一個數據框選擇前k行？

問題描述

1 個解決方案

解決方案1
0 2021-11-16 10:26:11

如何根據python中的另一個數據框選擇前k行？

問題描述

1 個解決方案

解決方案1 0 2021-11-16 10:26:11

解決方案1
0 2021-11-16 10:26:11