简体   繁体   English

ArcPy和Python编码乱七八糟?

[英]ArcPy and Python encoding messing up?

I am faced with a strange behavior between ArcPy and Python encoding. 我面临着ArcPy和Python编码之间的奇怪行为。 I work with VisualStudio 2010 Shell with Python tools for VS (PTVS) installed. 我使用VisualStudio 2010 Shell安装了用于VS(PTVS)的Python工具。 I isolated my problem through a simple script file. 我通过一个简单的脚本文件隔离了我的问题。 The py script file that contains the following commands. py脚本文件包含以下命令。 In VisualStudio, I have set the « Advanced Save Options...» to « UTF-8 without signature ». 在VisualStudio中,我已将“高级保存选项...”设置为“无签名的UTF-8”。 The script simply print on the screen a accented string, then import arcpy module, then again print the same string. 该脚本只需在屏幕上打印一个带重音的字符串,然后导入arcpy模块,然后再打印相同的字符串。 Importing Arcpy seems to change the Python encoding setup but I don't know why and I would like to restablish it correctly because it causes problems a little bit everywhere in the original script. 导入Arcpy似乎改变了Python编码设置,但我不知道为什么,我想正确地重新设置它,因为它在原始脚本中引起了一些问题。


I checked the python « encoding » folder and erased every pyc file. 我检查了python«encoding»文件夹并删除了每个pyc文件。 Than I ran the script and it generated 3 pyc files : 比我运行脚本,它生成了3个pyc文件:

  1. cp850.pyc (which corresponds to my stdout.encoding) cp850.pyc(对应于我的stdout.encoding)
  2. cp1252.pyc (which corresponds to my Windows environment encoding) cp1252.pyc(对应于我的Windows环境编码)
  3. utf_8.pyc (which fits the encoding of my script) utf_8.pyc(适合我脚本的编码)

When ArcPy is being imported, something comes altering the encoding that affects the initial variables. 当导入ArcPy时,会出现一些改变影响初始变量的编码。

Why? 为什么?

Is it possible with some Python command to find where the ArcPy encode cp1252 is located and read it so that I can make a function that deals with it? 是否可以使用一些Python命令找到ArcPy编码cp1252所在的位置并读取它以便我可以创建一个处理它的函数?

# -*- coding: utf-8 -*-
import sys
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
reload(sys) # See stackoverflow question 2276200
sys.setdefaultencoding('utf-8')
print ('Set default encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''

texte = u'Récuperation des données'
print ('Original type : %(t)s'%{'t':type(texte)})
print ('Original text : %(t)s'%{'t':texte})
print ''

import arcpy
print ('imported arcpy')
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''

print ('arcpy mess up original type : %(t)s'%{'t':type(texte)})
print ('arcpy mess up original text : %(t)s'%{'t':texte})
print ''

print ('arcpy mess up reencoded with cp1252 type : %(t)s'%{'t':type(texte.encode('cp1252'))})
print ('arcpy mess up reencoded with cp1252 text : %(t)s'%{'t':texte.encode('cp1252')})

raw_input()

and when I run the script, I get these results : 当我运行脚本时,我得到这些结果:

Loaded encoding : ascii 加载编码:ascii
Set encoding : utf-8 设置编码:utf-8

Original type : type 'unicode' 原始类型:输入'unicode'
Original text : Récuperation des données <--- This is right 原文:Récuperationdesdonnées <---这是对的

import arcpy 导入arcpy
Loaded encoding : utf-8 加载编码:utf-8

arcpy mess up original type : type 'unicode' arcpy搞砸原始类型:输入'unicode'
arcpy mess up original text : R'cuperation des donn'es> <--- This is wrong arcpy搞乱原文:R'cuperation des donn'es> <---这是错误的
arcpy mess up ReEncode with cp1252 type : type 'str' arcpy乱了用cp1252类型的ReEncode:输入'str'
arcpy mess up ReEncode with cp1252 text : Récuperation des données> <--- This is fits with the original unicode arcpy乱七八糟的ReEncode与cp1252文本:Récuperationdesdonnées> <---这适合原始的unicode

Answering my question. 回答我的问题。

From ESRI support, I got this information : 从ESRI的支持,我得到了这些信息:

By default, python in the command line is not going to change the code page to a UTF-8 based text for print statements to show up in Unicode. 默认情况下,命令行中的python不会将代码页更改为基于UTF-8的文本,以便以Unicode显示打印语句。 ArcGIS on the other hand specifically allows unicode values to be passed to it and has changed the code page within the command line so that the values you see printed are the values ArcGIS is using . 另一方面,ArcGIS专门允许将unicode值传递给它并更改了命令行中的代码页,以便您看到的值打印为ArcGIS正在使用的值 This is why the command line should be the only environment where you see the import sys followed by import arcpy give you a different printed value. 这就是为什么命令行应该是唯一一个环境,你看到import sys后面跟着import arcpy给你一个不同的打印值。

Since my application run scripts that does not always need arcpy, depending of what I want it to do, to solve my problem, I made a generic function that deals with the encoding, whether or not arcpy has been imported, using the information provided by : 由于我的应用程序运行的脚本并不总是需要arcpy,这取决于我想要它做什么,为了解决我的问题,我使用提供的信息创建了一个处理编码的通用函数,无论是否已导入arcpy。 :

Coding_CMD_Window = sys.stdout.encoding
Coding_OS = locale.getpreferredencoding()
Coding_Script = sys.getdefaultencoding()
Coding2Use = Coding_CMD_Window
if any('arcpy' in importedmodules for importedmodules in sys.modules):
     Coding2Use = Coding_OS

Also, I made sure that all of my scripts had the proper UTF-8 encoding without signature. 此外,我确保我的所有脚本都具有正确的UTF-8编码,没有签名。

Hope this helps anyone. 希望这有助于任何人。

For those in doubt, try something like the following (eg, in a .py file): 对于有疑问的人,请尝试以下内容(例如,在.py文件中):

import codecs
#import arcpy

f = codecs.open('utf.file.txt', encoding='utf-8-sig') #assuming a BOM present
l = f.readlines()
print u''.join(l)

Then run the same code once more, but first remove the hash comment from the arcpy line. 然后再次运行相同的代码,但首先从arcpy行中删除哈希注释。 It'll take about 6 seconds more time. 这需要大约6秒钟的时间。

What I get is perfectly fine text running the first version, gibberish when allowing arcpy to load. 我得到的是运行第一个版本的完美文本,允许arcpy加载时的乱码。

ArcGIS for Desktop version used: 10.2.1 使用的ArcGIS for Desktop版本:10.2.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM