Python-从列表中删除元素（外来字符）

Question

I have a python list with foreign characters that are denoted by some unicode values: 我有一个带有一些unicode值表示的带有外来字符的python列表：

python_list = ['to', 'shrink', u'\u7e2e\u3080', u'\u3061\u3062\u3080', 'chijimu', 'tizimu', 'tidimu', 'to', 'continue', u'\u7d9a\u304f', u'\u3064\u3065\u304f', 'tsuzuku', 'tuzuku', 'tuduku', u'\u30ed\u30fc\u30de\u5b57\uff08\u30ed\u30fc\u30de\u3058\uff09\u3068\u306f\u3001\u4eee\u540d\u6587\u5b57\u3092\u30e9\u30c6\u30f3\u6587\u5b57\u306b\u8ee2\u5199\u3059\u308b\u969b\u306e\u898f\u5247\u5168\u822c\uff08\u30ed\u30fc\u30de\u5b57\u8868\u8a18\u6cd5\uff09\u3001\u307e\u305f\u306f\u30e9\u30c6\u30f3\u6587\u5b57\u3067\u8868\u8a18\u3055\u308c\u305f\u65e5\u672c\u8a9e\uff08\u30ed\u30fc\u30de\u5b57\u3064\u3065\u308a\u306e\u65e5\u672c\u8a9e\uff09\u3092\u8868\u3059\u3002']

I need to remove all the items with '\縮 ' or other similar types . 我需要删除所有带有'\\ u7e2e'或其他类似类型的项目。 If the item in list contains even 1 ascii letter or word , it shouldn't be excluded. 如果列表中的项目甚至包含1个ascii字母或单词，则不应将其排除。 for eg: 'China\ぢ' should be included. 例如： 'China\ぢ'应包括在内。 I referred to this question and realized there's something related to values greater than 128. tried different approaches like this one: 我提到了这个问题，并意识到存在与大于128的值有关的东西。尝试了类似的方法：

new_list = [item for item in python_list if ord(item) < 128]

but this returns an error: 但这返回一个错误：

TypeError: ord() expected a character, but string of length 2 found

Expected Output: 预期产量：

new_list = ['to', 'shrink','chijimu', 'tizimu', 'tidimu', 'to', 'continue','tsuzuku', 'tuzuku', 'tuduku']

How should I go about this one?? 我该怎么办？

Answer 1

If you wish to keep all words that have at least one ascii letter in them then the code below will do this 如果您希望保留所有带有至少一个ascii字母的单词，则下面的代码将执行此操作

from string import ascii_letters, punctuation

python_list = ['to', 'shrink', u'\u7e2e\u3080', u'\u3061\u3062\u3080', 
               'chijimu','china,', 'tizimu', 'tidimu', 'to', 'continue', 
               u'\u7d9a\u304f', u'\u3064\u3065\u304f', 'tsuzuku', 'tuzuku', 'tuduku', u'china\u3061']

allowed = set(ascii_letters)

output = [word for word in python_list if any(letter in allowed for letter in word)]
print(output)
# ['to',
#  'shrink',
#  'chijimu',
#  'china,',
#  'tizimu',
#  'tidimu',
#  'to',
#  'continue'
#  'tsuzuku',
#  'tuzuku',
#  'tuduku',
#  'china?']

This will iterate through each letter of each word and if a single letter is also contained in allowed then it will add the word to your output list. 这将迭代每个单词的每个字母，如果allowed的单词中也包含单个字母，则会将该单词添加到您的output列表中。

Answer 2

您可以这样处理，因为您想保留字符串并删除unicode，

new_list = [item for item in python_list if isinstance(item, str)]

Answer 3

Here's one way: 这是一种方法：

import string
python_list = ['to', 'shrink', u'\u7e2e\u3080', u'\u3061\u3062\u3080', 'chijimu', 'tizimu', 'tidimu', 'to', 'continue', u'\u7d9a\u304f', u'\u3064\u3065\u304f', 'tsuzuku', 'tuzuku', 'tuduku', u'\u30ed\u30fc\u30de\u5b57\uff08\u30ed\u30fc\u30de\u3058\uff09\u3068\u306f\u3001\u4eee\u540d\u6587\u5b57\u3092\u30e9\u30c6\u30f3\u6587\u5b57\u306b\u8ee2\u5199\u3059\u308b\u969b\u306e\u898f\u5247\u5168\u822c\uff08\u30ed\u30fc\u30de\u5b57\u8868\u8a18\u6cd5\uff09\u3001\u307e\u305f\u306f\u30e9\u30c6\u30f3\u6587\u5b57\u3067\u8868\u8a18\u3055\u308c\u305f\u65e5\u672c\u8a9e\uff08\u30ed\u30fc\u30de\u5b57\u3064\u3065\u308a\u306e\u65e5\u672c\u8a9e\uff09\u3092\u8868\u3059\u3002']
filtered = [s for s in python_list if all(c in string.ascii_letters for c in s)]
print(filtered)

Output: 输出：

['to', 'shrink', 'chijimu', 'tizimu', 'tidimu', 'to', 'continue', 'tsuzuku', 'tuzuku', 'tuduku']

Answer 4

Yet another way: 另一种方式：

new_list=[]
for word in python_list:
    if word.encode('utf-8').decode('ascii','ignore') !='':
        new_list.append(word)

Python-从列表中删除元素（外来字符）

问题描述

4 个解决方案

解决方案1
3 已采纳 2014-10-23 12:19:32

解决方案2
2 2014-10-23 06:53:24

解决方案3
1 2014-10-23 06:59:27

解决方案4
1 2014-10-23 07:17:38

Python-从列表中删除元素（外来字符）

问题描述

4 个解决方案

解决方案1 3 已采纳 2014-10-23 12:19:32

解决方案2 2 2014-10-23 06:53:24

解决方案3 1 2014-10-23 06:59:27

解决方案4 1 2014-10-23 07:17:38

解决方案1
3 已采纳 2014-10-23 12:19:32

解决方案2
2 2014-10-23 06:53:24

解决方案3
1 2014-10-23 06:59:27

解决方案4
1 2014-10-23 07:17:38