如何獲取 python 中字符串出現的計數？

Question

我創建了 function 來檢測我指定的單詞並顯示它在哪一行，但是，我還想知道這些特定單詞在數據中重復了多少次或它們的計數

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    # Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results

# search for given strings in the file 'sample.txt'

matched_lines = search_multiple_strings_in_file('hello.csv', ['pre existing ', 'exclusions','limitations','fourteen','authorize','frequency','automatic','renewal','provision','annual limit','fraud notice'])
 
print('Total Matched lines : ', len(matched_lines))
for elem in matched_lines:
    print('Word = ', elem[0], ' :: Line Number = ', elem[1], ' :: Line = ', elem[2])

有沒有辦法可以在 SO 上上傳示例 csv？ 我是新來的，不確定我已經看到如何添加附件。 但是這個應用程序可以與任何虛擬 csv 一起使用。

例如，我只希望我的最終 output 也顯示單詞及其計數-

Words       Count
exclusions  10
renewal     22

Answer 1

在當前代碼中包含計數的一種簡單方法是使用collections.defaultdict()和簡單的 += 每個匹配字符串的計數。

然后我們可以將dict傳遞給Dataframe.from_dict()以生成我們的 output df 。

import pandas as pd
from collections import defaultdict

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    count = defaultdict(lambda: 0)
    # Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    count[string_to_search] += line.count(string_to_search)
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results, dict(count)


matched_lines, count = search_multiple_strings_in_file('hello.csv', ['pre existing ', 'exclusions','limitations','fourteen','authorize','frequency','automatic','renewal','provision','annual limit','fraud notice'])


df = pd.DataFrame.from_dict(count, orient='index').reset_index()
df.columns = ['Word', 'Count']

print(df)

Output

             Word  Count
0   pre existing       6
1        fourteen      5
2       authorize      5
3       frequency      5
4       automatic      5
5         renewal      5
6       provision      5
7    annual limit      6
8    fraud notice      6
9      exclusions      5
10    limitations      4

Answer 2

您可以通過多種方式進行操作，我希望您會對以下內容感到滿意。 您可以很容易地使用字典代替計數器，但計數器更方便：

from collections import Counter

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    freq_count = Counter()
# Open the file in read only mode
    with open("Hello.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
                    freq_count[string_to_search] +=line.count(string_to_search)
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return (list_of_results, freq_count)

編輯：我意識到我正在返回包含字符串的行數，而不是出現的總數。 我通過添加 line.count(string_to_search) 而不是在行中包含字符串時每行添加 1 來解決此問題。 計算一次計數並測試它是否為 0 而不是使用 if in 構造會更有效，但它不太可能對您的任務有意義。

Answer 3

您希望您的 function 在此處返回字典。 逐字迭代以提高效率。

def search_multiple_strings_in_file(file_name, list_of_strings):
    mapping = {s: [] for s in list_of_strings}

    with open(file_name, 'r') as read_obj:
        for i, line in enumerate(read_obj):
            for word in line.split(' '):
                if word in mapping:
                    mapping[word].append(i)

    return mapping

演示

with open('fakefile.txt', 'w') as f:
    f.write("Lorem ipsum dolor sit amet\n")
    f.write("consectetur adipiscing elit\n")
    f.write("sed do eiusmod tempor incididunt ut labore et dolore magna aliqua\n")
    f.write("Lorem ipsum dolor sit amet\n")


search_multiple_strings_in_file('fakefile.txt', ['ipsum', 'adipiscing', 'dolor'])
# {'ipsum': [0, 3], 'adipiscing': [1], 'dolor': [0, 3]}

如何獲取 python 中字符串出現的計數？

問題描述

3 個解決方案

解決方案1
1 已采納 2021-04-22 02:15:09

解決方案2
0 2021-04-22 02:14:59

解決方案3
0 2021-04-22 02:15:59

如何獲取 python 中字符串出現的計數？

問題描述

3 個解決方案

解決方案1 1 已采納 2021-04-22 02:15:09

解決方案2 0 2021-04-22 02:14:59

解決方案3 0 2021-04-22 02:15:59

解決方案1
1 已采納 2021-04-22 02:15:09

解決方案2
0 2021-04-22 02:14:59

解決方案3
0 2021-04-22 02:15:59