简体   繁体   English

计算行 python pandas 上的文本出现次数

[英]Count number of occurrences of text over row python pandas

I hope this is not too trivial, but I am kind of stuck.我希望这不是太微不足道,但我有点卡住了。 I am trying to count how many times the word "dog" appears in each row of a data frame.我试图计算“狗”一词出现在数据帧的每一行中的次数。 I then want to add the number in a new column.然后我想在新列中添加数字。

This is how the dataframe looks like at the moment:这就是 dataframe 目前的样子:

df_start = pd.DataFrame({'col1': ['House Home Dog', 0, 'Dog Flower Cat'], 'col2': ['Flower', 0, 0], 'col3': ['House Dog', 0, 'Dog Cat']})

I want to count how many times the word "dog" occurs in each row over multiple columns (in the final dataset I have more than 100 columns).我想计算“狗”一词在多列中的每一行中出现了多少次(在最终数据集中,我有超过 100 列)。 The final result should look sth like this:最终结果应该是这样的:

df_final = pd.DataFrame({'col1': ['House Home Dog', 0, 'Dog Flower Cat'], 'col2': ['Flower', 0, 0], 'col3': ['House Dog', 0, 'Dog Cat'], 'col4':[2, 0, 2]})

So far I am able to count the number of non null cells for each row or count how many times the word occurs in each column.到目前为止,我能够计算每行的非 null 单元的数量,或者计算每个列中单词出现的次数。 But not the desired outcome.但不是想要的结果。 Thank you in advance for your help.预先感谢您的帮助。

IIUC, this is what OP is looking for IIUC,这就是 OP 正在寻找的

df_start['dog_count'] = df_start.apply(lambda x: sum([i.lower().count('dog') for i in x if isinstance(i, str)]), axis=1)


[Out]:
             col1    col2       col3  dog_count
0  House Home Dog  Flower  House Dog          2
1               0       0          0          0
2  Dog Flower Cat       0    Dog Cat          2

This custom made function will count the word Dog , regardless of:这个定制的 function 将计算单词Dog ,无论:

  • The capitalization.大写。 Be it Dog, DoG, dog,... those will be counted.不管是狗,狗,狗,……这些都会被计算在内。

  • The number of times a word Dog appears in a specific cell.单词Dog在特定单元格中出现的次数。

If the dataframe looks like the following如果 dataframe 如下所示

df_start = pd.DataFrame({'col1': ['Dog Home Dog', 0, 'dog Flower Cat'], 'col2': ['Flower', 0, 0], 'col3': ['House Dog', 0, 'Dog Cat']})

[Out]:
             col1    col2       col3
0    Dog Home Dog  Flower  House Dog
1               0       0          0
2  dog Flower Cat       0    Dog Cat

After applying running the lambda function, one will get the following应用运行 lambda function 后,将得到以下信息

             col1    col2       col3  dog_count
0    Dog Home Dog  Flower  House Dog          3
1               0       0          0          0
2  dog Flower Cat       0    Dog Cat          2

Notes:笔记:

  • The number of ways one can solve OP's question is immense, as there are various nuances one can come across, so, in order to provide an ideal solution, one would have access to the full dataframe, so that one could explore the various use cases.解决 OP 问题的方法有很多种,因为可能会遇到各种细微差别,因此,为了提供理想的解决方案,可以访问完整的 dataframe,以便探索各种用例.

  • This approach, even though might be ideal for OP's use case, also has some limitations.这种方法虽然可能是 OP 用例的理想选择,但也有一些限制。 If one comes across the string Dogma , that will also be counted.如果遇到字符串Dogma ,那也将被计算在内。

You can do it cleanly in one line您可以在一行中干净利落地完成

df['col4'] = df.apply(lambda x: x.str.contains('Dog')).sum(axis=1)

Here's a panda-esque way of doing it:这是一种熊猫式的做法:

df_start["col4"] = (
df_start.apply(lambda col: col.str.lower() # make it lowercase
                    .str.count("dog")) # count 'dog's
                    .sum(axis=1) # take sum per row
                    .astype(int) # turn float to int
              )

Note that this will also count "dog" if it's just a substring of a word, like in "dogmatic" .请注意,如果它只是一个单词的 substring ,这也将计算"dog" ,例如"dogmatic"

here is one way to do it.这是一种方法。 one liner but split to add comments一个班轮,但分开添加评论

df['col4']=df.apply(
    lambda x: (' '.join(map(str, list(x)) )) # create a string by combining all columns for each row
    .lower() # turn it to lower case
    .count('dog')  # count the word 'dog'
    , axis=1
)
df
              col1  col2    col3       col4
0   Dog Home Dog    Flower  House Dog   3
1              0    0       0           0
2   Dog Flower Cat  0       Dog Cat     2

The other solutions will give in case of the modified input其他解决方案将在修改输入的情况下给出

{'col1': ['House Home Doggy', 0, 'Dog Flower Cat'], 'col2': ['Dogdog Flower', 0, 0], 'col3': ['House dog', 0, 'Dog Cat']}

not what you would expect:不是你所期望的:

0  House Home Doggy  Dogdog Flower  House dog          4
1                 0              0          0          0
2    Dog Flower Cat              0    Dog Cat          2

The solution below does not count as shown above, but is counting only occurrences of the stand-alone phrase 'dog': :下面的解决方案不计算如上所示,而是仅计算独立短语 'dog': 的出现次数:

import pandas as pd
df_start = pd.DataFrame({'col1': ['House Home Doggy', 0, 'Dog Flower Cat'], 'col2': ['Dogdog Flower', 0, 0], 'col3': ['House dog', 0, 'Dog Cat']})

def count_dogs(df_row):
   dogcount = 0
   for item in df_row:
       lst_item = str(item).split()
       for item in lst_item: 
           dogcount += 1 if item.lower() =="dog" else 0
   return dogcount
df_start['col4'] = df_start.apply(count_dogs, axis=1)    
print(df_start)

gives:给出:

               col1           col2       col3  col4
0  House Home Doggy  Dogdog Flower  House dog     1
1                 0              0          0     0
2    Dog Flower Cat              0    Dog Cat     2

and with:与:

def count_dogs(df_row, casesensitive=True):
   dogcount = 0
   for item in df_row:
       lst_item = str(item).split()
       for item in lst_item: 
           if casesensitive: 
               dogcount += 1 if item =="Dog" else 0
           else:
               dogcount += 1 if item.lower() =="dog" else 0
   return dogcount
df_start['col4'] = df_start.apply(count_dogs, axis=1)    
print(df_start)

you will get:你会得到:

0  House Home Doggy  Dogdog Flower  House dog     0
1                 0              0          0     0
2    Dog Flower Cat              0    Dog Cat     2

try:尝试:

df_start['col4'] = df_start.astype(str).sum(axis=1).str.lower().str.count("dog")

    col1            col2    col3        col4
0   House Home Dog  Flower  House Dog   2
1   0               0       0           0
2   Dog Flower Cat  0       Dog Cat     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM