I've created a function which can count specific words or characters in a text file.
But I want to create a condition where the function only counts a character if it is surrounded by letters. For example in the text file.
'This test is an example, this text doesn't have any meaning. It is only an example.'
If I were to run this text through my function, testing for the count of apostrophes ('), it will return 3. However I want it to return 1, only for apostrophes within 2 letter characters (eg isn't or won't), but I want it to ignore every other apostrophe, such a single quotes, that aren't surrounded in letters.
I've tried to use the.isalpha() method but am having trouble with the syntax.
I think regular expressions would be better for this, but if you must use isalpha
, something like:
s = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
sum(s[i-1].isalpha() and s[i]=="'" and s[i+1].isalpha() for i in range(1,len(s)-1))
returns 1.
If you just want to discount the quotes that are enclosing the string itself, the easiest way might be just to strip
those off the string before counting.
>>> text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
>>> text.strip("'").count("'")
1
Another way would be with a regular expression like \w'\w
, ie letter, followed by '
, followed by letter:
>>> sum(1 for _ in re.finditer("\w'\w", text))
1
This also works for quotes inside the string:
>>> text = "Text that has a 'quote' in it."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0
But it will also miss apostrophs that are not followed by another letter:
>>> text = "All the houses' windows were broken."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0
As xnx already noted, the proper way to do this is with regular expressions:
import re
text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
print(len(re.findall("[a-zA-Z]'[a-zA-Z]", text)))
"""
Out:
1
"""
Here the apostrophe in the pattern is surrounded by the set of English letters, but there are a number of predefined character sets, see the RE docs for details.
You should just use regex:
import re
text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
wordWrappedApos = re.compile(r"\w'\w")
found = re.findall(wordWrappedApos, text)
print(found)
print(len(found))
Substitute "\w" for "[A-Za-z]" if you want to make sure no numbers are in there.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.