简体   繁体   English

列表相对于Pandas数据框中每一行的出现频率

[英]Occurence frequency from a list against each row in Pandas dataframe

Let say I have a list of 6 integers named 'base' and a dataframe of 100,000 rows with 6 columns of integers as well. 假设我有一个名为“ base”的6个整数的列表,还有一个包含100,000行和6列整数的数据框。

I need to create an additional column which show frequency of occurences of the list 'base' against each row in the dataframe data. 我需要创建一个额外的列,该列针对数据帧数据中的每一行显示列表“ base”的出现频率。

The sequence of integers both in the list 'base' and dataframe are to be ignored in this case. 在这种情况下,列表“ base”和数据帧中的整数序列都将被忽略。

The occurrence frequency can have a value ranging from 0 to 6. 发生频率的取值范围为0到6。
0 means all 6 integers in list 'base' does not match any of 6 columns from a row in the dataframe. 0表示列表'base'中的所有6个整数与数据帧中一行的6列都不匹配。

Can anyone shed some light on this please ? 任何人都可以对此有所了解吗?

you can try this: 您可以尝试以下方法:

import pandas as pd

# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
                   'b':[8,5,3,2,11],
                   'c':[3,7,1,8,8],
                   'd':[3,7,1,8,8],
                   'e':[3,1,1,8,8],
                   'f':[7,7,1,8,8]})

# list of ints
base =[1,2,3,4,5,6]

# define function to count membership of list
def base_count(y):
    return sum(True for x in y if x in base)

# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)

outputs: 输出:

0    4
1    3
2    6
3    2
4    0
dtype: int64

then assign it to a new column: 然后将其分配给新列:

df['g'] = df.apply(base_count, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM