简体   繁体   中英

how to remove special characters from the list using python?

I have a list like this.

z=[']\'What type of humans arrived on the Indian subcontinent from Africa?\', \'When did humans first arrive on the Indian subcontinent?\', \'What subcontinent did humans first arrive on?\', \'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?\',\kingdoms were established in Southeast Asia?Indianized\']']

I want to convert it into simple 2d list.

z= [['What type of humans arrived on the Indian subcontinent from Africa?', 'When did humans first arrive on the Indian subcontinent?', 'What subcontinent did humans first arrive on?', 'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?','kingdoms were established in Southeast Asia?Indianized']]

so how to convert this list into 2D list?

The logic is not fully clear. I'd approach it using a regex on 2 or more non-word character to split:

[[x for x in re.split(r'[^a-z0-9\?]{2,}', s, flags=re.I) if x] for s in z]

output:

[['What type of humans arrived on the Indian subcontinent from Africa?',
  'When did humans first arrive on the Indian subcontinent?',
  'What subcontinent did humans first arrive on?',
  'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?',
  'kingdoms were established in Southeast Asia?Indianized']]

You can use the library re. It will replace all the regex the special caracters. With the space at the end (after the 9) it will keep the spaces. If you don't want the spaces, remove it.

import re
re.sub('[^A-Za-z0-9 ]+', '', mystring)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM