I am trying to find all occurrences of strings in a text file, where each string is located on a new line in the file.
For example, an example file may look like this:
jump start
jump go
feet start
jump go
The target tally would be 1 for all strings, except for "jump go" would have 2
So far, I have been successful at finding individual word counts using this code:
import re
import collections
with open('file.txt') as f:
text = f.read()
words = re.findall(r'\w+',text)
counts = collections.Counter(words)
print(counts)
However, this only gives output like: jump = 3, start = 2, go = 2, feet = 1
Not sure if this matters, but the number of lines in the file will be around 5 million, with around 12,000 independent strings.
Thank you for any help!
I got this to work:
import collections
lines = [line.strip() for line in open('results.txt')]
counts = collections.Counter(lines)
print counts
Output:
['Sam', 'sam', 'johm go', 'johm go', 'johm for']
Counter({'johm go': 2, 'sam': 1, 'Sam': 1, 'johm for': 1})
Instead of using the regex, read the file as words=f.readlines()
. You'll end up with a list of strings corresponding to each line. Then, build the counter from that list.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.