简体   繁体   中英

Python - Finding string frequencies of list of strings in text file

I am trying to find all occurrences of strings in a text file, where each string is located on a new line in the file.

For example, an example file may look like this:

jump start
jump go
feet start
jump go

The target tally would be 1 for all strings, except for "jump go" would have 2

So far, I have been successful at finding individual word counts using this code:

import re
import collections
with open('file.txt') as f:
    text = f.read()
words = re.findall(r'\w+',text)
counts = collections.Counter(words)
print(counts)

However, this only gives output like: jump = 3, start = 2, go = 2, feet = 1

Not sure if this matters, but the number of lines in the file will be around 5 million, with around 12,000 independent strings.

Thank you for any help!

I got this to work:

import collections

lines = [line.strip() for line in open('results.txt')]
counts = collections.Counter(lines)
print counts

Output:

['Sam', 'sam', 'johm go', 'johm go', 'johm for']
Counter({'johm go': 2, 'sam': 1, 'Sam': 1, 'johm for': 1})

Instead of using the regex, read the file as words=f.readlines() . You'll end up with a list of strings corresponding to each line. Then, build the counter from that list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM