简体   繁体   中英

How do i remove subscript/superscript in python

I have some strings have uses subscript and superscript.

Is there anyway i can remove them while keeping my string?

Here is an example, ¹ºUnless otherwise indicated . How can i remove the superscript of ¹º ?

Thanks in advance!

The ordinal values of ASCII characters (subscript/superscript characters are not in the ASCII table ) are in the range(128) . Note that range(128) excludes the upper bound (and when a lower bound is not provided, 0 is assumed to be the lower bound) of the range, so this maps to all of the numbers from 0-127. So, you can strip out any characters which are not in this range:

>>> x = '¹ºUnless otherwise indicated'
>>> y = ''.join([i for i in x if ord(i) < 128])
>>> y
'Unless otherwise indicated'

This iterates over all of the characters of x , excludes any which are not in the ASCII range, and then joins the resulting list of characters back into a str

The only sure way you can do is to enumerate all superscript and subscript symbols that might occur and remove the characters that match this set.

If your string is not so weird, you may try to identify for "letter other" and "number other" categories, which would cover other characters in addition to super- and subscripts. Such as this:

import unicodedata
s = "¹ºUnless otherwise indicated"
cleaned = "".join(c for c in s if unicodedata.category(c) not in ["No", "Lo"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM