简体   繁体   中英

Python - Splitting a string by special characters and numbers

I have a string that I want to split at every instance of an integer, unless an integer is directly followed by another integer. I then want to split that same string at "(" and ")".

myStr = ("H12(O1H2)2O2C1")
list1 = re.split('(\d+)', myStr)
print(list1)
list1 = re.split('(\W)', myStr)
print(list1)

I want the result to be ['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1'].

After:

re.split('(\d+)', myStr)

I get:

['H', '12', '(O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']

I now want to split up the open parenthesis and the "O" to make individual elements. Trying to split up a list after it's already been split up the way I tried doesn't work. Also, "myStr" eventually will be a user input, so I don't think that indexing through a known string (like myStr is in this example) would solve my issue. Open to suggestions.

You have to use character set to get what you want, change (\d+) to something like this ([\d]+|[\(\)])

import re

myStr = ("H12(O1H2)2O2C12")
list1 = re.split('([\d]+|[\(\)])', myStr)
# print(list1)

noempty_list = list(filter(None, list1))
print(noempty_list)

Output:

['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']

You also have to match the () characters and without it will print (O , and since re.split returns a list with empty value(s), just remove it

With ([\d]+|[AZ]) will work too but re.split will return more empty strings in the list

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM