简体   繁体   English

如何删除 python 中的下标/上标

[英]How do i remove subscript/superscript in python

I have some strings have uses subscript and superscript.我有一些字符串使用下标和上标。

Is there anyway i can remove them while keeping my string?无论如何我可以在保留字符串的同时删除它们吗?

Here is an example, ¹ºUnless otherwise indicated .这是一个示例, ¹ºUnless otherwise indicated How can i remove the superscript of ¹º ?如何删除¹º的上标?

Thanks in advance!提前致谢!

The ordinal values of ASCII characters (subscript/superscript characters are not in the ASCII table ) are in the range(128) . ASCII 字符的序数值(下标/上标字符不在 ASCII表中)在range(128) Note that range(128) excludes the upper bound (and when a lower bound is not provided, 0 is assumed to be the lower bound) of the range, so this maps to all of the numbers from 0-127.请注意, range(128)不包括范围的上限(如果未提供下限,则假定 0 为下限),因此它映射到 0-127 之间的所有数字。 So, you can strip out any characters which are not in this range:因此,您可以删除不在此范围内的任何字符:

>>> x = '¹ºUnless otherwise indicated'
>>> y = ''.join([i for i in x if ord(i) < 128])
>>> y
'Unless otherwise indicated'

This iterates over all of the characters of x , excludes any which are not in the ASCII range, and then joins the resulting list of characters back into a str这将遍历x的所有字符,排除任何不在 ASCII 范围内的字符,然后将生成的字符list重新连接到str

The only sure way you can do is to enumerate all superscript and subscript symbols that might occur and remove the characters that match this set.您可以做的唯一确定的方法是枚举所有可能出现的上标和下标符号并删除与此集合匹配的字符。

If your string is not so weird, you may try to identify for "letter other" and "number other" categories, which would cover other characters in addition to super- and subscripts.如果您的字符串不是那么奇怪,您可以尝试识别“其他字母”和“其他数字”类别,这将涵盖除上标和下标之外的其他字符。 Such as this:比如这样:

import unicodedata
s = "¹ºUnless otherwise indicated"
cleaned = "".join(c for c in s if unicodedata.category(c) not in ["No", "Lo"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM