简体   繁体   English

Pandas 将一列中列表中的项目与另一列中的单个值进行比较

[英]Pandas compare items in list in one column with single value in another column

Consider this two column df.考虑这两列df。 I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column.我想创建一个应用 function 将“other_yrs”列列表中的每个项目与“cur”列中的单个 integer 进行比较,并保持“other_yrs”列列表中大于或等于“cur”列中的单个值。 I cannot figure out how to enable pandas to do this with apply.我无法弄清楚如何通过应用启用 pandas 来执行此操作。 I am using apply functions for other purposes and they are working well.我将应用功能用于其他目的,它们运行良好。 Any ideas would be very appreciated.任何想法将不胜感激。

    cur other_yrs
1   11  [11, 11]
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4   16  [15, 85]
5   17  [17, 17, 16]
6   13  [8, 8]

Below is the function I used to extract the values into the "other_yrs" column.下面是我用来将值提取到“other_yrs”列中的 function。 I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count.我想我可以在这个 function 中插入某种方式,将列表中的每个连续值与“cur”列值进行比较并保持计数。 I really only need to store the count of how many of the list items are <= the value in the "cur" column.我真的只需要存储列表项的计数<=“cur”列中的值。

def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1:  #avoids col values of 0 meaning no other cases.
    pass
else:
    case_lst = col_string.split(", ")  #splits the string of cases into a list
    for i in case_lst:
        cs_yr = int(i[3:5])  #gets the case year from each individual case number
        cs_yr_lst.append(cs_yr)  #stores those integers in a list and then into a new column using apply
return cs_yr_lst

The expected output would be this:预期的 output 将是这样的:

  cur other_yrs    count
1   11  [11, 11]     2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]   11
4   16  [15, 85]     1
5   17  [17, 17, 16] 3
6   13  [8, 8]  2

Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:在 zip 列curother_yrs的列表理解中使用zip并在 boolean 掩码上使用np.sum

df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]

Another idea:另一个想法:

df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)

Result:结果:

   cur                                   other_yrs  count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

You can consider explode and compare then group on level=0 and sum:您可以考虑explode并比较,然后在 level=0 上分组并求和:

u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)

print(df)
    cur                                   other_yrs  Count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.如果列在两个数据框中都包含数百万条记录,并且必须将第一列中的每个元素与第二列中的所有元素进行比较,那么下面的代码可能会有所帮助。

for element in Dataframe1.Column1:
    
   Dataframe2[Dateframe2.Column2.isin([element])]

Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.上面的代码片段将逐一返回 dataframe2 的特定行,其中 dataframe1 中的元素位于 dataframe2.column2 中。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将一个 dataframe 中的一列与另一个单个值进行比较 - Compare a column in one dataframe with another single value 使用 Pandas 数据框将一列值与另一列中的其他元素列表进行比较 - using pandas dataframe compare one column value with other list of elements in another column pandas 一列中的项目数每个值在另一列中 - pandas number of items in one column per value in another column 一次将列值与另一次比较 pandas 日期时间索引 - Compare column value at one time to another pandas datetime index 在一个列表中查找项目,但不在熊猫数据框列中的另一个列表中查找项目 - Find items in one list but not in another in a pandas dataframe column 熊猫遍历行,将列值与列表中的字符串进行比较,从另一列返回值 - Pandas Iterate through rows, compare column value with string in a list, return a value from another column 遍历一个数据框中的单个列与另一个数据框中的列进行比较使用熊猫在第一个数据框中创建新列 - loop through a single column in one dataframe compare to a column in another dataframe create new column in first dataframe using pandas 比较一列中的值是否在另一列python pandas中的两个值之间 - compare whether value in one column is between two values in another column python pandas pandas 将列表值从一列应用到另一列 - pandas apply list value from one column to another 如何比较 pandas 中的一列值与多列值 - How to compare one column value with multiple column value in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM