pandas dataframe 迭代作为列表的单元格值并将每个元素与其他单元格进行比较

Question

I have a dataframe with 2 columns - a tuple and a list:我有一个 dataframe 有 2 列 - 一个元组和一个列表：

df = t        l
    (1,2) [1,2,3,4,5,6]
    (0,5) [1,4,9]
    (0,4) [9,11]

I want to add a new column of "how many elements from l are in the range of t. So for example, here if will be:我想添加一个新列“l 中有多少元素在 t 的范围内。例如，这里 if 将是：

df =counter  t       l
      2    (1,2) [1,2,3,4,5,6]
      2    (0,5) [1,4,9]
      0    (0,4) [9,11]

What is the best way to do so?最好的方法是什么？

Answer 1

Use list comprehension with generator and sum :将列表推导与生成器和sum一起使用：

df['counter'] = [sum(a <= i <= b for i in y) for (a, b), y in df[['t','l']].to_numpy()]

A bit faster solution with set.intersection is:使用set.intersection的一个更快的解决方案是：

df['counter'] = [len(set(range(a, b+1)).intersection(y)) 
                 for (a, b), y in df[['t','l']].to_numpy()]

print (df)
        t                   l  counter
0  (1, 2)  [1, 2, 3, 4, 5, 6]        2
1  (0, 5)           [1, 4, 9]        2
2  (0, 4)             [9, 11]        0

Performance in test data:测试数据中的表现：

#30k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [67]: %timeit [sum(a <= i <= b for i in y) for (a, b), y in df[['t','l']].to_numpy()]
65.3 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [68]: %timeit [len(set(range(a, b+1)).intersection(y)) for (a, b), y in df[['t','l']].to_numpy()]
60.7 ms ± 520 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

pandas dataframe 迭代作为列表的单元格值并将每个元素与其他单元格进行比较

问题描述

1 个解决方案

解决方案1
-1 已采纳 2021-11-23 08:07:18

pandas dataframe 迭代作为列表的单元格值并将每个元素与其他单元格进行比较

问题描述

1 个解决方案

解决方案1 -1 已采纳 2021-11-23 08:07:18

解决方案1
-1 已采纳 2021-11-23 08:07:18