简体   繁体   English

如何使用Python从unicode字符串中切片子字符串?

[英]How can I slice a substring from a unicode string with Python?

I have a unicode string as a result : u'splunk>\\xae\' 结果我有一个unicode字符串: u'splunk>\\xae\'

How can I get the substring 'uf001' 如何获得子字符串“ uf001”

as a simple string in python? 作为python中的简单字符串?

The characters uf001 are not actually present in the string, so you can't just slice them off. 字符uf001实际上不存在于字符串中,因此您不能仅将它们切掉。 You can do 你可以做

repr(s)[-6:-1]

or 要么

'u' + hex(ord(s[-1]))[2:]

Since you want the actual string (as seen from comments) , just get the last character [-1] index , Example - 由于您需要实际的字符串(从注释中看到),因此只需获取最后一个字符[-1] index ,示例-

>>> a = u'splunk>\xae\uf001'
>>> print(a)
splunk>®ï€
>>> a[-1]
'\uf001'
>>> print(a[-1])
ï€

If you want the unicode representation ( \ ) , then take repr(a[-1]) , Example - 如果要使用unicode表示形式( \ ),请使用repr(a[-1]) ,示例-

>>> repr(a[-1])
"'\\uf001'"

\ is a single unicode character (not multiple strings) , so you can directly get that character as above. \是单个unicode字符(而不是多个字符串),因此您可以如上所述直接获取该字符。

You see \ because you are checking the results of repr() on the string, if you print it, or use it somewhere else (like for files, etc) it will be the correct \ character. 之所以会看到\是因为您正在检查字符串上repr()的结果,如果打印它,或者在其他地方使用它(如文件等),它将是正确的\字符。

u'' it is how a Unicode string is represented in Python source code. u''这就是在Python源代码中表示Unicode字符串的方式。 REPL uses this representation by default to display unicode objects: 默认情况下,REPL使用此表示形式来显示unicode对象:

>>> u'splunk>\xae\uf001'
u'splunk>\xae\uf001'
>>> print(u'splunk>\xae\uf001')
splunk>®
>>> print(u'splunk>\xae\uf001'[-1])


If your terminal is not configured to display Unicode or if you are on a narrow build (eg, it is likely for Python 2 on Windows) then the result may be different. 如果您的终端未配置为显示Unicode,或者您的构建版本较窄(例如Windows上的Python 2很有可能),则结果可能会有所不同。

Unicode string is an immutable sequence of Unicode codepoints in Python. Unicode字符串是Python中Unicode代码点的不可变序列。 len(u'\') == 1 : it does not contain uf001 (5 characters) in it. len(u'\') == 1 :其中不包含uf001 (5个字符)。 You could write it as u'' (it is necessary to declare the character encoding of your source file on Python 2 if you use non-ascii characters): 您可以将其写为u'' (如果使用非ascii字符,则必须在Python 2上声明源文件的字符编码):

>>> u'\uf001' == u''
True

It is just a different way to represent exactly the same Unicode character (a single codepoint in this case). 这是表示完全相同的Unicode字符(在这种情况下为单个代码点)的另一种方式。

Note: some user-perceived characters may span several Unicode codepoints eg: 注意:某些用户可感知的字符可能跨越多个 Unicode代码点,例如:

>>> import unicodedata
>>> unicodedata.normalize('NFKD', u'ё')
u'\u0435\u0308'
>>> print(unicodedata.normalize('NFKD', u'ё'))
ё

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 python 中的字符串中删除 substring? - How can I remove a substring from a string in python? 如何在Python的字符串中大写子字符串? - How can I capitalize a substring in a string in Python? 如何在python中解码unicode字符串? - how can i decode unicode string in python? 如何将字符串从第一次出现的子字符串的索引切片到Python中第二次出现的子字符串? - How can I slice a string from the index of the first occurrence of a sub string to the second occurrence of a sub string in Python? 如何使用字符串的子集从 python 中的字符串中获取 substring? - How can I get a substring from a string in python using a subset of string? 如何在Python 2.7中将unicode字符串转换为字符串文字? - How can I convert a unicode string into string literals in Python 2.7? 如何比较 2 个列表并从 1 个列表中删除包含其他列表中的 substring 的字符串? Python - How can I compare 2 list and remove the a string from 1 list that contain a substring from other list? Python 为什么我可以在python中更新列表切片但不更新字符串切片? - Why can I update a list slice but not a string slice in python? 在python中切片unicode字符串的正确方法是什么? - What is the correct way to slice a unicode string in python? 如何使用python从字符串中提取信用卡子字符串 - How can I extract credit card substring from a string using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM