Pandas 將一列中列表中的項目與另一列中的單個值進行比較

Question

考慮這兩列df。 我想創建一個應用 function 將“other_yrs”列列表中的每個項目與“cur”列中的單個 integer 進行比較，並保持“other_yrs”列列表中大於或等於“cur”列中的單個值。 我無法弄清楚如何通過應用啟用 pandas 來執行此操作。 我將應用功能用於其他目的，它們運行良好。 任何想法將不勝感激。

    cur other_yrs
1   11  [11, 11]
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4   16  [15, 85]
5   17  [17, 17, 16]
6   13  [8, 8]

下面是我用來將值提取到“other_yrs”列中的 function。 我想我可以在這個 function 中插入某種方式，將列表中的每個連續值與“cur”列值進行比較並保持計數。 我真的只需要存儲列表項的計數<=“cur”列中的值。

def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1:  #avoids col values of 0 meaning no other cases.
    pass
else:
    case_lst = col_string.split(", ")  #splits the string of cases into a list
    for i in case_lst:
        cs_yr = int(i[3:5])  #gets the case year from each individual case number
        cs_yr_lst.append(cs_yr)  #stores those integers in a list and then into a new column using apply
return cs_yr_lst

預期的 output 將是這樣的：

  cur other_yrs    count
1   11  [11, 11]     2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]   11
4   16  [15, 85]     1
5   17  [17, 17, 16] 3
6   13  [8, 8]  2

Answer 1

在 zip 列cur和other_yrs的列表理解中使用zip並在 boolean 掩碼上使用np.sum ：

df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]

另一個想法：

df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)

結果：

   cur                                   other_yrs  count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answer 2

您可以考慮explode並比較，然后在 level=0 上分組並求和：

u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)

print(df)
    cur                                   other_yrs  Count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answer 3

如果列在兩個數據框中都包含數百萬條記錄，並且必須將第一列中的每個元素與第二列中的所有元素進行比較，那么下面的代碼可能會有所幫助。

for element in Dataframe1.Column1:
    
   Dataframe2[Dateframe2.Column2.isin([element])]

上面的代碼片段將逐一返回 dataframe2 的特定行，其中 dataframe1 中的元素位於 dataframe2.column2 中。

Pandas 將一列中列表中的項目與另一列中的單個值進行比較

問題描述

3 個解決方案

解決方案1
3 已采納 2020-07-23 15:32:08

解決方案2
2 2020-07-23 15:29:59

解決方案3
0 2021-10-20 06:52:14

Pandas 將一列中列表中的項目與另一列中的單個值進行比較

問題描述

3 個解決方案

解決方案1 3 已采納 2020-07-23 15:32:08

解決方案2 2 2020-07-23 15:29:59

解決方案3 0 2021-10-20 06:52:14

解決方案1
3 已采納 2020-07-23 15:32:08

解決方案2
2 2020-07-23 15:29:59

解決方案3
0 2021-10-20 06:52:14