简体   繁体   English

如何在python 3中模拟python 2 str.lower()

[英]How to simulate python 2 str.lower() in python 3

There appears to be a difference between how python 2.7.15 and 3.7.2 perform the lowercase operation. python 2.7.15和3.7.2如何执行小写操作之间似乎有区别。

I have a large dictionary and a large list which were written using python 2, but which I want to use in python 3 (imported from file using pickle). 我有一个大字典和一个大型列表,使用python 2编写,但我想在python 3中使用(使用pickle从文件导入)。 For each item in the list of strings, there is a key in the dict for the python2 lower() case. 对于字符串列表中的每个项目,在python2 lower()情况的dict中有一个键。 Unfortunately, they're not the same as the python3 lower() case. 不幸的是,它们与python3 lower()案例不同。

How can I get the answer to what python 2 would have returned to unicode.lower() , while running in python 3? 在python 3中运行时,如何得到python 2返回unicode.lower()的答案?

An example of a string in the list from python 3 is 'İle' , the lowercase of which is 'i̇le' (which incidentally, is NOT the ascii 'ile' ). python 3列表中的一个字符串示例是'İle' ,其小写字母是'i̇le' (顺便'i̇le' ,它不是'i̇le' 'ile' )。 This is not in the dictionary. 这不在字典中。 From the pickle, what python 3 reads as "İle" is read into python 2 as u'\İle' , the lowercase of which is "ile" (the ascii string), which is in the dict. 从泡菜,什么巨蟒-3作为读"İle"作为被读入蟒2 u'\İle' ,小写其中是"ile" (在ASCII字符串),这在字典。 And that's what I need to return. 这就是我需要回归的东西。

To clarify, I'm adding an example (where the latter is the ascii string). 为了澄清,我正在添加一个例子(后者是ascii字符串)。

python 2.7: python 2.7:

>>> u"\u0130le".lower() == "ile"
>>> True

python 3.7: python 3.7:

>>> u"\u0130le".lower() == "ile"
>>> False

You can use the Unidecode library. 您可以使用Unidecode库。

This library converts unicode to it's closest ASCII equivalent, which appears to be what you want. 这个库将unicode转换为它最接近的ASCII等价物,这似乎是你想要的。

>>> from unidecode import unidecode
>>> unidecode(u'\u0130le'.lower()) == 'ile'
True

EDIT: As pointed out by user2357112, this does not match Python 2.7's unicode.lower(). 编辑:正如user2357112所指出的,这与Python 2.7的unicode.lower()不匹配。 Python 2.7 uses the C library function towlower , so to exactly match that function you will need to use some interface to C (such as Python 2.7 itself as in mkiever's answer). Python 2.7使用C库函数towlower ,因此为了与该函数完全匹配,您需要使用一些C接口(例如Python 2.7本身,如mkiever的答案)。 If you don't need to keep any non-ascii symbols, however, this should work. 但是,如果您不需要保留任何非ascii符号,则应该可以使用。

Brute force solution. 蛮力解决方案。

Create a lower map in Python2 and then use this in Python3. 在Python2中创建一个较低的映射,然后在Python3中使用它。

Python2 program to create the map: Python2程序创建地图:

f = open('py2_lower_map', 'w')

for i in range(256):
    for j in range(256):
        b = chr(j) + chr(i)
        try:
            low = b.decode('utf16').lower()
        except:
            low = str('?')
        f.write(low.encode('utf-8'))

f.close()

Demo of how to use the map in Python3: 演示如何在Python3中使用地图:

f = open('py2_lower_map', 'r', encoding='utf-8')
_py2_lower_map = f.read()
f.close()

def py2_lower(u):
    return ''.join(_py2_lower_map[ord(c)] for c in u)

low = py2_lower('İle')
print(low)
print([ord(c) for c in low])

To be honest, this might have rough corners and is quick&dirty, but mainly do the correct thing. 说实话,这可能有粗糙的角落,快速而肮脏,但主要做正确的事情。 It works on one example ;-) 它适用于一个例子;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM