[英]Python - Convert list item into unicode if item is string
I've a list that can have mixed str and unicode strings: 我有一个可以混合使用str和unicode字符串的列表:
lst = ['string1', u'string2', 'string3', u'string4']
I need to convert every list item in unicode if the item is a str. 如果项目是一个str,我需要转换unicode中的每个列表项目。 To convert a str to unicode I use:
要将str转换为unicode,请使用:
s = s.decode('utf-8')
The problem is that if the string is already unicode and contains a non-ascii character, if I try to decode it I get UnicodeEncodeError: 'ascii' codec can't encode character ... 问题是,如果字符串已经是unicode并且包含非ASCII字符,如果我尝试对其进行解码,则会得到UnicodeEncodeError:'ascii'编解码器无法对字符进行编码...
so I thought something like: 所以我想:
lst = [i.decode('utf-8') for i in lst if isinstance(i, str)]
But this actually deletes from the list the unicode strings. 但这实际上从列表中删除了unicode字符串。
尝试这个:
lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst ]
You are filtering (removing non-matching elements); 您正在过滤(删除不匹配的元素); you need to use a conditional expression instead:
您需要使用条件表达式 :
lst = [i.decode('utf-8') if isinstance(i, str) else i for i in lst]
The <true> if <condition> else <false>
expression here produces an output, always. 此处的
<true> if <condition> else <false>
表达式始终产生输出。 Here that is the decoded string, or the original object unchanged if it is not a str
object. 这是解码后的字符串,如果不是
str
对象,则原始对象不变。
While you could use a ternary expression in your list comprehension to correctly convert elements, in my opinion it would be cleaner to extract the logic to a separate helper function: 虽然您可以在列表理解中使用三元表达式来正确地转换元素,但我认为将逻辑提取到单独的辅助函数中会更干净:
def convert_to_unicode(s):
"""
convert `s` to unicode. If `s` is already
unicode, return `s` as is.
"""
if isinstance(s, str):
return s.decode('utf-8')
else:
return s
Then you can simply call the function on each element of your list: 然后,您可以简单地在列表的每个元素上调用该函数:
lst = [convert_to_unicode(i) for i in lst]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.