Pandas 将一列中列表中的项目与另一列中的单个值进行比较

Question

Consider this two column df.考虑这两列df。 I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column.我想创建一个应用 function 将“other_yrs”列列表中的每个项目与“cur”列中的单个 integer 进行比较，并保持“other_yrs”列列表中大于或等于“cur”列中的单个值。 I cannot figure out how to enable pandas to do this with apply.我无法弄清楚如何通过应用启用 pandas 来执行此操作。 I am using apply functions for other purposes and they are working well.我将应用功能用于其他目的，它们运行良好。 Any ideas would be very appreciated.任何想法将不胜感激。

    cur other_yrs
1   11  [11, 11]
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4   16  [15, 85]
5   17  [17, 17, 16]
6   13  [8, 8]

Below is the function I used to extract the values into the "other_yrs" column.下面是我用来将值提取到“other_yrs”列中的 function。 I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count.我想我可以在这个 function 中插入某种方式，将列表中的每个连续值与“cur”列值进行比较并保持计数。 I really only need to store the count of how many of the list items are <= the value in the "cur" column.我真的只需要存储列表项的计数<=“cur”列中的值。

def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1:  #avoids col values of 0 meaning no other cases.
    pass
else:
    case_lst = col_string.split(", ")  #splits the string of cases into a list
    for i in case_lst:
        cs_yr = int(i[3:5])  #gets the case year from each individual case number
        cs_yr_lst.append(cs_yr)  #stores those integers in a list and then into a new column using apply
return cs_yr_lst

The expected output would be this:预期的 output 将是这样的：

  cur other_yrs    count
1   11  [11, 11]     2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]   11
4   16  [15, 85]     1
5   17  [17, 17, 16] 3
6   13  [8, 8]  2

Answer 1

Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:在 zip 列cur和other_yrs的列表理解中使用zip并在 boolean 掩码上使用np.sum ：

df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]

Another idea:另一个想法：

df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)

Result:结果：

   cur                                   other_yrs  count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answer 2

You can consider explode and compare then group on level=0 and sum:您可以考虑explode并比较，然后在 level=0 上分组并求和：

u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)

print(df)
    cur                                   other_yrs  Count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answer 3

If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.如果列在两个数据框中都包含数百万条记录，并且必须将第一列中的每个元素与第二列中的所有元素进行比较，那么下面的代码可能会有所帮助。

for element in Dataframe1.Column1:
    
   Dataframe2[Dateframe2.Column2.isin([element])]

Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.上面的代码片段将逐一返回 dataframe2 的特定行，其中 dataframe1 中的元素位于 dataframe2.column2 中。

Pandas 将一列中列表中的项目与另一列中的单个值进行比较

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-07-23 15:32:08

解决方案2
2 2020-07-23 15:29:59

解决方案3
0 2021-10-20 06:52:14

Pandas 将一列中列表中的项目与另一列中的单个值进行比较

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-07-23 15:32:08

解决方案2 2 2020-07-23 15:29:59

解决方案3 0 2021-10-20 06:52:14

解决方案1
3 已采纳 2020-07-23 15:32:08

解决方案2
2 2020-07-23 15:29:59

解决方案3
0 2021-10-20 06:52:14