繁体   English   中英

如何使用python从列表中删除特殊字符?

[英]how to remove special characters from the list using python?

我有一个这样的清单。

z=[']\'What type of humans arrived on the Indian subcontinent from Africa?\', \'When did humans first arrive on the Indian subcontinent?\', \'What subcontinent did humans first arrive on?\', \'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?\',\kingdoms were established in Southeast Asia?Indianized\']']

我想把它转换成简单的二维列表。

z= [['What type of humans arrived on the Indian subcontinent from Africa?', 'When did humans first arrive on the Indian subcontinent?', 'What subcontinent did humans first arrive on?', 'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?','kingdoms were established in Southeast Asia?Indianized']]

那么如何将此列表转换为二维列表?

逻辑并不完全清楚。 我会在 2 个或更多非单词字符上使用正则表达式来拆分它:

[[x for x in re.split(r'[^a-z0-9\?]{2,}', s, flags=re.I) if x] for s in z]

输出:

[['What type of humans arrived on the Indian subcontinent from Africa?',
  'When did humans first arrive on the Indian subcontinent?',
  'What subcontinent did humans first arrive on?',
  'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?',
  'kingdoms were established in Southeast Asia?Indianized']]

您可以使用库重新。 它将替换特殊字符的所有正则表达式。 末尾有空格(在 9 之后),它将保留空格。 如果您不想要空格,请将其删除。

import re
re.sub('[^A-Za-z0-9 ]+', '', mystring)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM