基於跨列的多個條件在 Pandas 數據框中高效選擇行

Question

我正在嘗試根據條件創建一個新的 Pandas 數據框。 這是原始數據框：

        topic1 topic2 
name1    1      4
name2    4      4
name3    4      3
name4    4      4
name5    2      4

我想選擇任意行，以便topic1 == 4出現2次， topic2 == 4出現在新的數據幀的3倍。 一旦完成，我想停止代碼。

bucket1_topic1 = 2
bucket1_topic2 = 3

我寫了這個非常復雜的啟動器，它“幾乎”可以工作……但是我在處理滿足 topic1 和 topic2 條件的行時遇到了問題。 什么是更有效和正確的方法來做到這一點？

rows_list = []

counter1 = 0
counter2 = 0

for index,row in data.iterrows():
    if counter1 < bucket1_topic1:
        if row.topic1 == 4:
            counter1 +=1
            rows_list.append([row[1], row.topic1, row.topic2])

    if counter2 < bucket1_topic2:
        if row.topic2 == 4 and row.topic1 !=4:
            counter2 +=1
            if [row[1], row.topic1, row.topic2] not in rows_list:
                rows_list.append([row[1], row.topic1, row.topic2])

期望的結果，其中topic1 == 4出現兩次並且topic2 == 4出現3次：

        topic1 topic2 
name1    1      4
name2    4      4
name3    4      3
name5    2      4

Answer 1

避免循環並考慮使用DataFrame.sample任意重新排列行（其中frac=1表示返回數據幀的 100% 部分），然后使用groupby().cumcount()計算運行組計數。 最后，使用邏輯子集過濾：

df = (df.sample(frac=1)
        .assign(t1_grp = lambda x: x.groupby(["topic1"]).cumcount(),
                t2_grp = lambda x: x.groupby(["topic2"]).cumcount())
     )

final_df = df[(df["topic1"].isin([1,2,3])) | 
              (df["topic2"].isin([1,2,3])) |
              ((df["topic1"] == 4) & (df["t1_grp"] < 2)) |
              ((df["topic2"] == 4) & (df["t2_grp"] < 3))]

final_df = final_df.drop(columns=["t1_grp", "t2_grp"])

基於跨列的多個條件在 Pandas 數據框中高效選擇行

問題描述

1 個解決方案

解決方案1
1 2020-02-02 14:46:52

基於跨列的多個條件在 Pandas 數據框中高效選擇行

問題描述

1 個解決方案

解決方案1 1 2020-02-02 14:46:52

解決方案1
1 2020-02-02 14:46:52