How to set a word count in a text file

Question

I need to find the number of words in a file. Any sequence of alphanumeric characters with a length >= 1 and with the beginning and ending non-alphanumeric character removed counts as a word.

Here is the code I have so far:

num_words = 0

textfile = open('gettysburg.txt', 'r').read()
words = textfile.split()
for word in words:
   if len(word) >= 1:
     num_words +=1

 print(num_words)

The counter gives me 268, but there are 271 words in the text. There are four words that are separated by dashes or "--" which are being counted as 2 words. How do I strip the non-letter characters to display these 4 words?

Answer 1

I don't think you want to strip the hyphens, you just want them noted as characters that can make a word. You might use a regular expression.

re.findall('[\w\-]+', 'words in sentence. some hyphenated-together.')

gives

['words', 'in', 'sentence', 'some', 'hyphenated-together']

Answer 2

Hey you are incredibly close.

The string.split() function takes a parameter str which by default is white-space. You can also change the letter that the string should be split by.

num_words = 0
textfile = open('gettysburg.txt', 'r').read()
words = textfile.split()
for word in words:
   count = len(word.split(str = "-"))
   num_words += count
print(num_words)

Python Tutorials has a nice description about the function.

How to set a word count in a text file

Question

2 answers

solution1
1 2017-05-12 07:45:28

solution2
0 2017-05-12 07:20:53

How to set a word count in a text file

Question

2 answers

solution1 1 2017-05-12 07:45:28

solution2 0 2017-05-12 07:20:53

solution1
1 2017-05-12 07:45:28

solution2
0 2017-05-12 07:20:53