Python: Using .isalpha() to count specific words/characters in a word count

Question

I've created a function which can count specific words or characters in a text file.

But I want to create a condition where the function only counts a character if it is surrounded by letters. For example in the text file.

'This test is an example, this text doesn't have any meaning. It is only an example.'

If I were to run this text through my function, testing for the count of apostrophes ('), it will return 3. However I want it to return 1, only for apostrophes within 2 letter characters (eg isn't or won't), but I want it to ignore every other apostrophe, such a single quotes, that aren't surrounded in letters.

I've tried to use the.isalpha() method but am having trouble with the syntax.

Answer 1

I think regular expressions would be better for this, but if you must use isalpha , something like:

s = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
sum(s[i-1].isalpha() and s[i]=="'" and s[i+1].isalpha() for i in range(1,len(s)-1))

returns 1.

Answer 2

If you just want to discount the quotes that are enclosing the string itself, the easiest way might be just to strip those off the string before counting.

>>> text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
>>> text.strip("'").count("'")
1

Another way would be with a regular expression like \w'\w , ie letter, followed by ' , followed by letter:

>>> sum(1 for _ in re.finditer("\w'\w", text))
1

This also works for quotes inside the string:

>>> text = "Text that has a 'quote' in it."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

But it will also miss apostrophs that are not followed by another letter:

>>> text = "All the houses' windows were broken."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

Answer 3

As xnx already noted, the proper way to do this is with regular expressions:

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

print(len(re.findall("[a-zA-Z]'[a-zA-Z]", text)))
"""
Out:
    1
"""

Here the apostrophe in the pattern is surrounded by the set of English letters, but there are a number of predefined character sets, see the RE docs for details.

Answer 4

You should just use regex:

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

wordWrappedApos = re.compile(r"\w'\w")
found = re.findall(wordWrappedApos, text)
print(found)
print(len(found))

Substitute "\w" for "[A-Za-z]" if you want to make sure no numbers are in there.

Python: Using .isalpha() to count specific words/characters in a word count

Question

4 answers

solution1
0 2019-10-21 12:34:39

solution2
0 2019-10-21 12:42:12

solution3
0 2019-10-21 12:42:51

solution4
0 2019-10-21 12:43:40

Python: Using .isalpha() to count specific words/characters in a word count

Question

4 answers

solution1 0 2019-10-21 12:34:39

solution2 0 2019-10-21 12:42:12

solution3 0 2019-10-21 12:42:51

solution4 0 2019-10-21 12:43:40

solution1
0 2019-10-21 12:34:39

solution2
0 2019-10-21 12:42:12

solution3
0 2019-10-21 12:42:51

solution4
0 2019-10-21 12:43:40