I am trying to figure out how I can user the full stop (.) as a delimiter in a string sentence I am converting into a list. The following is my code
def convert_to_word_list(text):
word_list = re.split(' |\\, |\\; |\\? |\\.', text)
print(word_list)
to_lower_case_list = [word.lower() for word in word_list]
return to_lower_case_list
print(convert_to_word_list("Hello. my; name, is? Mad Max"))
Now if you were to run this code it would return a list of all the strings excluding the special characters all in lower case, but wherever I add a full stop it prints and empty string instead. For instance the print statement here prints the following:
['hello', '', 'my', 'name', 'is', 'mad', 'max']
With an empty string between hello and my, where the full stop was at the end of hello and this pretty much happens anywhere where i add a full stop
Thank you in advance
Your other regex fragments are followed by space eg \\,
. You could change |\\.
to |\\.
however, none of this will work if there are no spaces eg "Hello.Fred"
will result in ['Hello.Fred']
, not ['Hello', 'Fred']
.
Your existing code will also fail with leading whitespace, trailing whitespace, and trailing word separators.
You can try the following:
import re
def convert_to_word_list(text):
word_list = re.split("\\s+|\\,\\s*|\\;\\s*|\\?\\s*|\\.\\s*", text.strip())
to_lower_case_list = [word.lower() for word in word_list]
return list(filter(None, to_lower_case_list))
print(convert_to_word_list("Hello.my; name, is? Mad Max")
# result: ['hello', 'my', 'name', 'is', 'mad', 'max']
print(convert_to_word_list(" Hello.Fred.")
# result: ['hello', 'fred']
But a better option might be to just to replace non-words with whitespace, then split on whitespace, for example:
def convert_to_word_list(s):
return [w.lower() for w in re.sub(r"[^\w\s]", " ", s).split()]
Try splitting on the delimiters plus any additional spaces:
re.split('\,\s*|\;\s*|\?\s*|\.\s*', text)
This makes;
import re
def convert_to_word_list(text):
word_list = re.split('\,\s*|\;\s*|\?\s*|\.\s*', text)
print(word_list)
to_lower_case_list = [word.lower() for word in word_list]
return to_lower_case_list
print(convert_to_word_list("Hello. my; name, is? Mad Max"))
Output:
['Hello', 'my', 'name', 'is', 'Mad Max']
['hello', 'my', 'name', 'is', 'mad max']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.