简体   繁体   中英

How can i use a full stop as one of multiple delimiters in python

I am trying to figure out how I can user the full stop (.) as a delimiter in a string sentence I am converting into a list. The following is my code

    def convert_to_word_list(text):
        word_list = re.split(' |\\, |\\; |\\? |\\.', text)
        print(word_list)
        to_lower_case_list = [word.lower() for word in word_list]
        return to_lower_case_list

print(convert_to_word_list("Hello. my; name, is? Mad Max"))

Now if you were to run this code it would return a list of all the strings excluding the special characters all in lower case, but wherever I add a full stop it prints and empty string instead. For instance the print statement here prints the following:

['hello', '', 'my', 'name', 'is', 'mad', 'max']

With an empty string between hello and my, where the full stop was at the end of hello and this pretty much happens anywhere where i add a full stop

Thank you in advance

Your other regex fragments are followed by space eg \\, . You could change |\\. to |\\. however, none of this will work if there are no spaces eg "Hello.Fred" will result in ['Hello.Fred'] , not ['Hello', 'Fred'] .

Your existing code will also fail with leading whitespace, trailing whitespace, and trailing word separators.

You can try the following:

import re

def convert_to_word_list(text):
    word_list = re.split("\\s+|\\,\\s*|\\;\\s*|\\?\\s*|\\.\\s*", text.strip())
    to_lower_case_list = [word.lower() for word in word_list]
    return list(filter(None, to_lower_case_list))

print(convert_to_word_list("Hello.my; name, is? Mad Max")
# result: ['hello', 'my', 'name', 'is', 'mad', 'max']

print(convert_to_word_list("  Hello.Fred.")
# result: ['hello', 'fred']

But a better option might be to just to replace non-words with whitespace, then split on whitespace, for example:

def convert_to_word_list(s):
    return [w.lower() for w in re.sub(r"[^\w\s]", " ", s).split()]

Try splitting on the delimiters plus any additional spaces:

re.split('\,\s*|\;\s*|\?\s*|\.\s*', text)

This makes;

import re
def convert_to_word_list(text):
        word_list = re.split('\,\s*|\;\s*|\?\s*|\.\s*', text)
        print(word_list)
        to_lower_case_list = [word.lower() for word in word_list]
        return to_lower_case_list

print(convert_to_word_list("Hello. my; name, is? Mad Max"))

Output:

['Hello', 'my', 'name', 'is', 'Mad Max']
['hello', 'my', 'name', 'is', 'mad max']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM