[英]Python script doesn't write Chinese characters to XML file
I'm making a mod for a game where the majority of the files are XMLs, the text of which is Simplified Chinese.我正在为大多数文件是 XML 的游戏制作一个 mod,其中的文本是简体中文。 My goal is to replace all of the Simplified Chinese in the files with Traditional, followed by an English translation.我的目标是将文件中的所有简体中文替换为繁体,然后是英文翻译。 I'm using the Cloud Translate API from Google to do that part, and it all works fine.我正在使用 Google 的 Cloud Translate API 来完成这部分工作,一切正常。 At first I was just doing a find and replace on the Chinese text and then adding English to the end of string, but the issue with that is that I'm getting extra English translations whenever the Chinese text occurs more than once.起初我只是对中文文本进行查找和替换,然后将英文添加到字符串的末尾,但问题是每当中文文本出现不止一次时,我都会得到额外的英文翻译。
In an effort to fix that I read more of the XML documentation for Python, and I started trying to use tree.write, but that's where I'm getting stuck.为了解决这个问题,我阅读了更多关于 Python 的 XML 文档,然后我开始尝试使用 tree.write,但这就是我卡住的地方。 When I use it, the XML file has the UTF codes for the Chinese characters, rather than the actual characters.当我使用它时,XML 文件有汉字的 UTF 代码,而不是实际的字符。 If I open the file in a web browser, the characters render correctly, but at this point I'm just unsure if they'll still work with the game if they're not writing into the XML normally.如果我在 web 浏览器中打开文件,角色会正确渲染,但此时我只是不确定如果他们没有正常写入 XML,他们是否仍然可以使用游戏。
Here's an example XML I'm working with:这是我正在使用的示例 XML:
<Texts Type="Story">
<List>
<Text Name="TradeAuction">
<DisplayName>拍卖会</DisplayName>
<Desc>[NAME]来到了[PLACE],发现此地有个拍卖行。</Desc>
<Selections.0.Display>参与拍卖</Selections.0.Display>
<Selections.1.Display>离去</Selections.1.Display>
</Text>
</List>
</Texts>
My code which works but sometimes duplicates English translations:我的代码有效但有时会重复英文翻译:
import lxml.etree as ET
from google.cloud import translate_v2 as translate
import pinyin
translator = translate.Client()
tgt = "zh-TW"
tt = "en"
with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'r', encoding="utf-8") as f:
tree = ET.parse(f)
root = tree.getroot()
for elem in root.iter('Text'):
print(elem.text)
for child in elem:
txt = child.text
ttxt = translator.translate(txt, target_language=tgt)
etxt = translator.translate(txt, target_language=tt)
with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'r') as n:
new = n.read().replace(txt, ttxt['translatedText'] + '(' + etxt['translatedText'] + ')', 1)
with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'w') as n:
n.write(new)
The output of that looks like this: output 看起来像这样:
<Texts Type="Story">
<List>
<Text Name="TradeAuction">
<DisplayName>拍賣會(auctions)</DisplayName>
<Desc>[NAME]來到了[PLACE],發現此地有個拍賣行。([NAME] came to [PLACE] and found an auction house here.)</Desc>
<Selections.0.Display>參與拍賣(Participate in the auction)</Selections.0.Display>
<Selections.1.Display>離去(Leave)</Selections.1.Display>
</Text>
</List>
</Texts>
And here's my tree.write code:这是我的 tree.write 代码:
import lxml.etree as ET
from google.cloud import translate_v2 as translate
import pinyin
translator = translate.Client()
tgt = "zh-TW"
tt = "en"
with open('/home/dave/zh-TW/Settings/MapStories/MapStory_Auction.xml', 'r', encoding="utf-8") as f:
tree = ET.parse(f)
root = tree.getroot()
for elem in root.iter('Text'):
print(elem.text)
for child in elem:
print(child.text)
txt = child.text
ttxt = translator.translate(txt, target_language=tgt)
etxt = translator.translate(txt, target_language=tt)
child.text = ttxt['translatedText'] + "(" + etxt['translatedText'] + ")"
tree.write('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml')
And the output from that looks like this: output 看起来像这样:
<Texts Type="Story">
<List>
<Text Name="TradeAuction">
<DisplayName>拍賣會(auctions)</DisplayName>
<Desc>[NAME]來到了[PLACE],發現此地有個拍賣行。([NAME] came to [PLACE] and found an auction house here.)</Desc>
<Selections.0.Display>參與拍賣(Participate in the auction)</Selections.0.Display>
<Selections.1.Display>離去(Leave)</Selections.1.Display>
</Text>
</List>
</Texts>
Any help would be appreciated.任何帮助,将不胜感激。 I think once I figure this out I should be able to fly through the rest of the translating.我想一旦我弄清楚这一点,我应该能够飞过翻译的 rest。
tree.write('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml')
Per the documentation :根据文档:
write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)
... ...
The output is either a string (str) or binary (bytes). output 是字符串 (str) 或二进制 (bytes)。 This is controlled by the encoding argument.这由 encoding 参数控制。 If encoding is "unicode", the output is a string;如果 encoding 为“unicode”,则 output 为字符串; otherwise, it's binary.否则,它是二进制的。 Note that this may conflict with the type of file if it's an open file object;请注意,如果它是打开文件 object,这可能与文件类型冲突; make sure you do not try to write a string to a binary stream and vice versa.确保不要尝试将字符串写入二进制 stream ,反之亦然。
So we just need to set the encoding
parameter appropriately.所以我们只需要适当地设置encoding
参数。 Writing as ASCII means that non-ASCII characters need to be entity-escaped ( 拍
etc.) (It still writes to the file without a problem, of course, because the UTF-8 encoding specified for the file is ASCII-transparent.)写为 ASCII 意味着非 ASCII 字符需要进行实体转义( 拍
等)(当然,它仍然可以毫无问题地写入文件,因为为文件指定的 UTF-8 编码是 ASCII 透明的。 )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.