简体   繁体   中英

How to set a word count in a text file

I need to find the number of words in a file. Any sequence of alphanumeric characters with a length >= 1 and with the beginning and ending non-alphanumeric character removed counts as a word.

Here is the code I have so far:

num_words = 0

textfile = open('gettysburg.txt', 'r').read()
words = textfile.split()
for word in words:
   if len(word) >= 1:
     num_words +=1

 print(num_words)

The counter gives me 268, but there are 271 words in the text. There are four words that are separated by dashes or "--" which are being counted as 2 words. How do I strip the non-letter characters to display these 4 words?

I don't think you want to strip the hyphens, you just want them noted as characters that can make a word. You might use a regular expression.

re.findall('[\w\-]+', 'words in sentence. some hyphenated-together.')

gives

['words', 'in', 'sentence', 'some', 'hyphenated-together']

Hey you are incredibly close.

The string.split() function takes a parameter str which by default is white-space. You can also change the letter that the string should be split by.

num_words = 0
textfile = open('gettysburg.txt', 'r').read()
words = textfile.split()
for word in words:
   count = len(word.split(str = "-"))
   num_words += count
print(num_words)

Python Tutorials has a nice description about the function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM