简体   繁体   English

如何获取 python 中字符串出现的计数?

[英]How do I get the count of string occurrence in python?

I created function to detect words I specify and display which line it is in, however, I also want to know how many times in the data do those particular words repeat more than once or their count我创建了 function 来检测我指定的单词并显示它在哪一行,但是,我还想知道这些特定单词在数据中重复了多少次或它们的计数

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    # Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results

# search for given strings in the file 'sample.txt'

matched_lines = search_multiple_strings_in_file('hello.csv', ['pre existing ', 'exclusions','limitations','fourteen','authorize','frequency','automatic','renewal','provision','annual limit','fraud notice'])
 
print('Total Matched lines : ', len(matched_lines))
for elem in matched_lines:
    print('Word = ', elem[0], ' :: Line Number = ', elem[1], ' :: Line = ', elem[2])
  

Is there a way I can upload a sample csv on SO?有没有办法可以在 SO 上上传示例 csv? im new here not sure i have seen how to add attachments.我是新来的,不确定我已经看到如何添加附件。 But this app would work with any dummy csv.但是这个应用程序可以与任何虚拟 csv 一起使用。

I just want my final output to also display the words and their count for example-例如,我只希望我的最终 output 也显示单词及其计数-

Words       Count
exclusions  10
renewal     22

A simple way to include a count in your current code is to use collections.defaultdict() and simple += the count of each matched string.在当前代码中包含计数的一种简单方法是使用collections.defaultdict()和简单的 += 每个匹配字符串的计数。

We can then pass the dict to a Dataframe.from_dict() to generate our output df .然后我们可以将dict传递给Dataframe.from_dict()以生成我们的 output df


import pandas as pd
from collections import defaultdict

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    count = defaultdict(lambda: 0)
    # Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    count[string_to_search] += line.count(string_to_search)
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results, dict(count)


matched_lines, count = search_multiple_strings_in_file('hello.csv', ['pre existing ', 'exclusions','limitations','fourteen','authorize','frequency','automatic','renewal','provision','annual limit','fraud notice'])


df = pd.DataFrame.from_dict(count, orient='index').reset_index()
df.columns = ['Word', 'Count']

print(df)

Output Output

             Word  Count
0   pre existing       6
1        fourteen      5
2       authorize      5
3       frequency      5
4       automatic      5
5         renewal      5
6       provision      5
7    annual limit      6
8    fraud notice      6
9      exclusions      5
10    limitations      4

You can do it a number of ways I expect you'll be happy with something like the below.您可以通过多种方式进行操作,我希望您会对以下内容感到满意。 You can use a dictionary in place of a counter easy enough but the Counter is more convenient:您可以很容易地使用字典代替计数器,但计数器更方便:

from collections import Counter

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    freq_count = Counter()
# Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
                    freq_count[string_to_search] +=line.count(string_to_search)
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return (list_of_results, freq_count)

Edit: I realised that I was returning the number of lines that contained the strings instead of the total number of occurrences.编辑:我意识到我正在返回包含字符串的行数,而不是出现的总数。 I fixed this by adding line.count(string_to_search) instead of just adding 1 per line when a string was contained in the line.我通过添加 line.count(string_to_search) 而不是在行中包含字符串时每行添加 1 来解决此问题。 It would be a bit more efficient to calculate the count once and test if it's 0 instead of using the if in construct but it is unlikely to be meaningful for your task.计算一次计数并测试它是否为 0 而不是使用 if in 构造会更有效,但它不太可能对您的任务有意义。

You want your function to return a dictionary here.您希望您的 function 在此处返回字典。 Iterate word by word to be more efficient.逐字迭代以提高效率。

def search_multiple_strings_in_file(file_name, list_of_strings):
    mapping = {s: [] for s in list_of_strings}

    with open(file_name, 'r') as read_obj:
        for i, line in enumerate(read_obj):
            for word in line.split(' '):
                if word in mapping:
                    mapping[word].append(i)

    return mapping

Demo演示

with open('fakefile.txt', 'w') as f:
    f.write("Lorem ipsum dolor sit amet\n")
    f.write("consectetur adipiscing elit\n")
    f.write("sed do eiusmod tempor incididunt ut labore et dolore magna aliqua\n")
    f.write("Lorem ipsum dolor sit amet\n")


search_multiple_strings_in_file('fakefile.txt', ['ipsum', 'adipiscing', 'dolor'])
# {'ipsum': [0, 3], 'adipiscing': [1], 'dolor': [0, 3]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算 Python 中字符串列表中每个项目的出现次数? - How do I count the occurrence of each item from a list in a string in Python? 如何根据 dataframe 中的条件计算字符串值的出现次数? - How do I count the occurrence of string values based on a condition in a dataframe? 如何使用 python 计算列表中元素的出现次数? - How do I count the occurrence of elements in a list using python? 如何计算列表中嵌套列表/字典较多的给定字符串的出现? - How do I count the occurrence of a given string in list that has more nested lists/dictionaries? 如何计算列表中每个唯一出现的类? (Python) - How do I count up each unique occurrence of class in list? (Python) Python:如何计算列表或字符串中重叠的特定模式的发生? - Python: how to count the occurrence of specific pattern with overlap in a list or string? 计算python中其他字符串中一个字符串的出现次数 - to count the occurrence of one string in other string in python 如何从 Python 中的列表中获取具有各自出现次数的唯一值? - How to get unique values with respective occurrence count from a list in Python? 如何计算 python 中字母的出现次数? - How to count occurrence of alphabet in python? 如果单词在字典中,我如何计算每行中的单词出现次数 - How do I count word occurrence in each line if the word is in a dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM