列表相对于Pandas数据框中每一行的出现频率

Question

Let say I have a list of 6 integers named 'base' and a dataframe of 100,000 rows with 6 columns of integers as well. 假设我有一个名为“ base”的6个整数的列表，还有一个包含100,000行和6列整数的数据框。

I need to create an additional column which show frequency of occurences of the list 'base' against each row in the dataframe data. 我需要创建一个额外的列，该列针对数据帧数据中的每一行显示列表“ base”的出现频率。

The sequence of integers both in the list 'base' and dataframe are to be ignored in this case. 在这种情况下，列表“ base”和数据帧中的整数序列都将被忽略。

The occurrence frequency can have a value ranging from 0 to 6. 发生频率的取值范围为0到6。
0 means all 6 integers in list 'base' does not match any of 6 columns from a row in the dataframe. 0表示列表'base'中的所有6个整数与数据帧中一行的6列都不匹配。

Can anyone shed some light on this please ? 任何人都可以对此有所了解吗？

Answer 1

you can try this: 您可以尝试以下方法：

import pandas as pd

# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
                   'b':[8,5,3,2,11],
                   'c':[3,7,1,8,8],
                   'd':[3,7,1,8,8],
                   'e':[3,1,1,8,8],
                   'f':[7,7,1,8,8]})

# list of ints
base =[1,2,3,4,5,6]

# define function to count membership of list
def base_count(y):
    return sum(True for x in y if x in base)

# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)

outputs: 输出：

0    4
1    3
2    6
3    2
4    0
dtype: int64

then assign it to a new column: 然后将其分配给新列：

df['g'] = df.apply(base_count, axis=1)

列表相对于Pandas数据框中每一行的出现频率

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-11-05 05:43:51

列表相对于Pandas数据框中每一行的出现频率

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-11-05 05:43:51

解决方案1
0 已采纳 2015-11-05 05:43:51