Python - Finding string frequencies of list of strings in text file

Question

I am trying to find all occurrences of strings in a text file, where each string is located on a new line in the file.

For example, an example file may look like this:

jump start
jump go
feet start
jump go

The target tally would be 1 for all strings, except for "jump go" would have 2

So far, I have been successful at finding individual word counts using this code:

import re
import collections
with open('file.txt') as f:
    text = f.read()
words = re.findall(r'\w+',text)
counts = collections.Counter(words)
print(counts)

However, this only gives output like: jump = 3, start = 2, go = 2, feet = 1

Not sure if this matters, but the number of lines in the file will be around 5 million, with around 12,000 independent strings.

Thank you for any help!

Answer 1

I got this to work:

import collections

lines = [line.strip() for line in open('results.txt')]
counts = collections.Counter(lines)
print counts

Output:

['Sam', 'sam', 'johm go', 'johm go', 'johm for']
Counter({'johm go': 2, 'sam': 1, 'Sam': 1, 'johm for': 1})

Answer 2

Instead of using the regex, read the file as words=f.readlines() . You'll end up with a list of strings corresponding to each line. Then, build the counter from that list.

Python - Finding string frequencies of list of strings in text file

Question

2 answers

solution1
2 2015-03-11 23:27:43

solution2
0 ACCPTED 2015-03-11 23:24:33

Python - Finding string frequencies of list of strings in text file

Question

2 answers

solution1 2 2015-03-11 23:27:43

solution2 0 ACCPTED 2015-03-11 23:24:33

solution1
2 2015-03-11 23:27:43

solution2
0 ACCPTED 2015-03-11 23:24:33