简体   繁体   中英

how to abandon special characters and numbers when using re.split with string?

If I want to split a string with spaces preserved, but don't want to include special characters and numbers.

So it would look like this.

sentence = "jak3 love$ $b0x1n%"
list_after_split = ["jak", " ", "love", " ", "bxn"]

I want to use re.split() , but I am not sure what to write as a pattern.

Try filtering the unwanted characters out first:

>>> import re
>>> sentence = "jak3 love$ $b0x1n%"
>>> sentence_filtered = re.sub(r'[^a-zA-Z\s]+', '', sentence)
>>> # Alternative: sentence_filtered = ''.join(ch for ch in sentence if ch.isalpha() or ch.isspace())
>>> sentence_filtered
'jak love bxn'
>>> re.split('(\s+)', sentence_filtered)
['jak', ' ', 'love', ' ', 'bxn']

If you want to condense whitespaces into a single space:

import re

# String with multi-spaces, tab(s), and newline(s).
s='Jak3     \t love$s \n  $D0ax1t3e90r%.'
print(s)
# Jak3         love$s 
#   $D0ax1t3e90r%.

# First, remove all characters which aren't letters or a space.
# Second, condense spaces together into a single space.
# Third, split into desired list.
print(re.split(r'( )', re.sub(r' +',' ',re.sub(r'[^a-zA-Z ]+', '', s))))
# ['Jak', ' ', 'loves', ' ', 'Daxter']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM