str.format（）中的Unicode错误

Question

I am trying to run the following script, which scans for *.csproj files and checks for project dependencies in Visual Studio solutions, but I am getting the following error. 我正在尝试运行以下脚本，该脚本扫描*.csproj文件并检查Visual Studio解决方案中的项目依赖项，但是出现以下错误。 I have already tried all sorts of codec and encode/decode and u'' combination, to no avail... 我已经尝试了各种codec以及encode/decode和u''组合，但无济于事...

(the diacritics are intended and I plan to keep them). （变音符号是有意的，我计划保留它们）。

 Traceback (most recent call last): File "E:\\00 GIT\\SolutionDependencies.py", line 44, in <module> references = GetProjectReferences("MiotecGit") File "E:\\00 GIT\\SolutionDependencies.py", line 40, in GetProjectReferences outputline = u'"{}" -> "{}"'.format(projectName, referenceName) UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 19: ordinal not in range(128)

import glob
import os
import fnmatch
import re
import subprocess
import codecs

gvtemplate = """
digraph g {

rankdir = "LR"

#####

}
""".strip()

def GetProjectFiles(rootFolder):
    result = []
    for root, dirnames, filenames in os.walk(rootFolder):
        for filename in fnmatch.filter(filenames, "*.csproj"):
            result.append(os.path.join(root, filename))
    return result

def GetProjectName(path):
    result = os.path.splitext(os.path.basename(path))[0]
    return result

def GetProjectReferences(rootFolder):
    result = []
    projectFiles = GetProjectFiles(rootFolder)
    for projectFile in projectFiles:
        projectName = GetProjectName(projectFile)
        with codecs.open(projectFile, 'r', "utf-8") as pfile:
            content = pfile.read()
            references = re.findall("<ProjectReference.*?</ProjectReference>", content, re.DOTALL)
            for reference in references:
                referenceProject = re.search('"([^"]*?)"', reference).group(1)
                referenceName = GetProjectName(referenceProject)
                outputline = u'"{}" -> "{}"'.format(projectName, referenceName)
                result.append(outputline)
    return result

references = GetProjectReferences("MiotecGit")

output = u"\n".join(*references)

with codecs.open("output.gv", "w", 'utf-8') as outputfile:
    outputfile.write(gvtemplate.replace("#####", output))


graphvizpath = glob.glob(r"C:\Program Files*\Graphviz*\bin\dot.*")[0]
command = '{} -Gcharset=latin1 -T pdf -o "output.pdf" "output.gv"'.format(graphvizpath)
subprocess.call(command)

Answer 1

When Python 2.x tries to use a byte string in a Unicode context, it automatically tries to decode the byte string to a Unicode string using the ascii codec. 当Python 2.x尝试在Unicode上下文中使用字节字符串时，它会自动尝试使用ascii编解码器将字节字符串decode为Unicode字符串。 While the ascii codec is a safe choice, it often doesn't work. 虽然ascii编解码器是一个安全的选择，但它通常不起作用。

For Windows environments the mbcs codec will select the code page that Windows uses for 8-bit characters. 对于Windows环境， mbcs编解码器将选择Windows用于8位字符的代码页。 You can decode the string yourself explicitly. 您可以自己显式解码字符串。

outputline = u'"{}" -> "{}"'.format(projectName.decode('mbcs'), referenceName.decode('mbcs'))

str.format（）中的Unicode错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-08-17 21:45:25

str.format（）中的Unicode错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-08-17 21:45:25

解决方案1
1 已采纳 2016-08-17 21:45:25