簡體   English   中英

str.format()中的Unicode錯誤

[英]Unicode error in `str.format()`

我正在嘗試運行以下腳本,該腳本掃描*.csproj文件並檢查Visual Studio解決方案中的項目依賴項,但是出現以下錯誤。 我已經嘗試了各種codec以及encode/decodeu''組合,但無濟於事...

(變音符號有意的,我計划保留它們)。

 Traceback (most recent call last): File "E:\\00 GIT\\SolutionDependencies.py", line 44, in <module> references = GetProjectReferences("MiotecGit") File "E:\\00 GIT\\SolutionDependencies.py", line 40, in GetProjectReferences outputline = u'"{}" -> "{}"'.format(projectName, referenceName) UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 19: ordinal not in range(128) 
import glob
import os
import fnmatch
import re
import subprocess
import codecs

gvtemplate = """
digraph g {

rankdir = "LR"

#####

}
""".strip()

def GetProjectFiles(rootFolder):
    result = []
    for root, dirnames, filenames in os.walk(rootFolder):
        for filename in fnmatch.filter(filenames, "*.csproj"):
            result.append(os.path.join(root, filename))
    return result

def GetProjectName(path):
    result = os.path.splitext(os.path.basename(path))[0]
    return result

def GetProjectReferences(rootFolder):
    result = []
    projectFiles = GetProjectFiles(rootFolder)
    for projectFile in projectFiles:
        projectName = GetProjectName(projectFile)
        with codecs.open(projectFile, 'r', "utf-8") as pfile:
            content = pfile.read()
            references = re.findall("<ProjectReference.*?</ProjectReference>", content, re.DOTALL)
            for reference in references:
                referenceProject = re.search('"([^"]*?)"', reference).group(1)
                referenceName = GetProjectName(referenceProject)
                outputline = u'"{}" -> "{}"'.format(projectName, referenceName)
                result.append(outputline)
    return result

references = GetProjectReferences("MiotecGit")

output = u"\n".join(*references)

with codecs.open("output.gv", "w", 'utf-8') as outputfile:
    outputfile.write(gvtemplate.replace("#####", output))


graphvizpath = glob.glob(r"C:\Program Files*\Graphviz*\bin\dot.*")[0]
command = '{} -Gcharset=latin1 -T pdf -o "output.pdf" "output.gv"'.format(graphvizpath)
subprocess.call(command)

當Python 2.x嘗試在Unicode上下文中使用字節字符串時,它會自動嘗試使用ascii編解碼器將字節字符串decode為Unicode字符串。 雖然ascii編解碼器是一個安全的選擇,但它通常不起作用。

對於Windows環境, mbcs編解碼器將選擇Windows用於8位字符的代碼頁。 您可以自己顯式解碼字符串。

outputline = u'"{}" -> "{}"'.format(projectName.decode('mbcs'), referenceName.decode('mbcs'))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM