I have a sentence that I want to parse to check for some conditions:
a) If there is a period and it is followed by a whitespace followed by a lowercase letter
b) If there is a period internal to a sequence of letters with no adjacent whitespace (ie www.abc.com)
c) If there is a period followed by a whitespace followed by an uppercase letter and preceded by a short list of titles (ie Mr., Dr. Mrs.)
Currently I am iterating through the string (line) and using the next() function to see whether the next character is a space or lowercase, etc. And then I just loop through the line. But how would I check to see what the next, next character would be? And how would I find the previous ones?
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
Any help would be appreciated. Thank you.
Regular expressions is what you want.
Since your going to check for a pattern in a string, you can make use of the python's builtin support for regular expressions through re
library.
Example:
#To check if there is a period internal to a sequence of letters with no adjacent whitespace
import re
str = 'www.google.com'
pattern = '.*\..*'
obj = re.compile(pattern)
if obj.search(str):
print "Pattern matched"
Similarly generate patterns for the conditions you want to check in your string.
#If there is a period and it is followed by a whitespace followed by a lowercase letter
regex = '.*\. [a-z].*'
You can generate and test your regular expressions online using this simple tool
Read more extensively about re
library here
You can use multiple next operations to get more data
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
c = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
You can get previous ones by saving your iterations to a temporary list
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.