简体   繁体   English

Python 脚本不将汉字写入 XML 文件

[英]Python script doesn't write Chinese characters to XML file

I'm making a mod for a game where the majority of the files are XMLs, the text of which is Simplified Chinese.我正在为大多数文件是 XML 的游戏制作一个 mod,其中的文本是简体中文。 My goal is to replace all of the Simplified Chinese in the files with Traditional, followed by an English translation.我的目标是将文件中的所有简体中文替换为繁体,然后是英文翻译。 I'm using the Cloud Translate API from Google to do that part, and it all works fine.我正在使用 Google 的 Cloud Translate API 来完成这部分工作,一切正常。 At first I was just doing a find and replace on the Chinese text and then adding English to the end of string, but the issue with that is that I'm getting extra English translations whenever the Chinese text occurs more than once.起初我只是对中文文本进行查找和替换,然后将英文添加到字符串的末尾,但问题是每当中文文本出现不止一次时,我都会得到额外的英文翻译。

In an effort to fix that I read more of the XML documentation for Python, and I started trying to use tree.write, but that's where I'm getting stuck.为了解决这个问题,我阅读了更多关于 Python 的 XML 文档,然后我开始尝试使用 tree.write,但这就是我卡住的地方。 When I use it, the XML file has the UTF codes for the Chinese characters, rather than the actual characters.当我使用它时,XML 文件有汉字的 UTF 代码,而不是实际的字符。 If I open the file in a web browser, the characters render correctly, but at this point I'm just unsure if they'll still work with the game if they're not writing into the XML normally.如果我在 web 浏览器中打开文件,角色会正确渲染,但此时我只是不确定如果他们没有正常写入 XML,他们是否仍然可以使用游戏。

Here's an example XML I'm working with:这是我正在使用的示例 XML:

<Texts Type="Story">
  <List>
    <Text Name="TradeAuction">
      <DisplayName>拍卖会</DisplayName>
      <Desc>[NAME]来到了[PLACE],发现此地有个拍卖行。</Desc>
      <Selections.0.Display>参与拍卖</Selections.0.Display>
      <Selections.1.Display>离去</Selections.1.Display>
    </Text>
  </List>
</Texts>

My code which works but sometimes duplicates English translations:我的代码有效但有时会重复英文翻译:

import lxml.etree as ET
from google.cloud import translate_v2 as translate
import pinyin
translator = translate.Client()
tgt = "zh-TW"
tt = "en"
with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'r', encoding="utf-8") as f:
    tree = ET.parse(f)
    root = tree.getroot()
    for elem in root.iter('Text'):
        print(elem.text)
        for child in elem:
            txt = child.text
            ttxt = translator.translate(txt, target_language=tgt)
            etxt = translator.translate(txt, target_language=tt)
            with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'r') as n:
                new = n.read().replace(txt, ttxt['translatedText'] + '(' + etxt['translatedText'] + ')', 1)
            with open('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml', 'w') as n:
                n.write(new)

The output of that looks like this: output 看起来像这样:

<Texts Type="Story">
  <List>
    <Text Name="TradeAuction">
      <DisplayName>拍賣會(auctions)</DisplayName>
      <Desc>[NAME]來到了[PLACE],發現此地有個拍賣行。([NAME] came to [PLACE] and found an auction house here.)</Desc>
      <Selections.0.Display>參與拍賣(Participate in the auction)</Selections.0.Display>
      <Selections.1.Display>離去(Leave)</Selections.1.Display>
    </Text>
  </List>
</Texts>

And here's my tree.write code:这是我的 tree.write 代码:

import lxml.etree as ET
from google.cloud import translate_v2 as translate
import pinyin
translator = translate.Client()
tgt = "zh-TW"
tt = "en"
with open('/home/dave/zh-TW/Settings/MapStories/MapStory_Auction.xml', 'r', encoding="utf-8") as f:
    tree = ET.parse(f)
    root = tree.getroot()
    for elem in root.iter('Text'):
        print(elem.text)
        for child in elem:
            print(child.text)
            txt = child.text
            ttxt = translator.translate(txt, target_language=tgt)
            etxt = translator.translate(txt, target_language=tt)
            child.text = ttxt['translatedText'] + "(" + etxt['translatedText'] + ")"
        tree.write('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml')

And the output from that looks like this: output 看起来像这样:

<Texts Type="Story">
  <List>
    <Text Name="TradeAuction">
      <DisplayName>&#25293;&#36067;&#26371;(auctions)</DisplayName>
      <Desc>[NAME]&#20358;&#21040;&#20102;[PLACE]&#65292;&#30332;&#29694;&#27492;&#22320;&#26377;&#20491;&#25293;&#36067;&#34892;&#12290;([NAME] came to [PLACE] and found an auction house here.)</Desc>
      <Selections.0.Display>&#21443;&#33287;&#25293;&#36067;(Participate in the auction)</Selections.0.Display>
      <Selections.1.Display>&#38626;&#21435;(Leave)</Selections.1.Display>
    </Text>
  </List>
</Texts>

Any help would be appreciated.任何帮助,将不胜感激。 I think once I figure this out I should be able to fly through the rest of the translating.我想一旦我弄清楚这一点,我应该能够飞过翻译的 rest。

tree.write('/home/dave/zh-TW-final/Settings/MapStories/MapStory_Auction.xml')

Per the documentation :根据文档

write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)

... ...

The output is either a string (str) or binary (bytes). output 是字符串 (str) 或二进制 (bytes)。 This is controlled by the encoding argument.这由 encoding 参数控制。 If encoding is "unicode", the output is a string;如果 encoding 为“unicode”,则 output 为字符串; otherwise, it's binary.否则,它是二进制的。 Note that this may conflict with the type of file if it's an open file object;请注意,如果它是打开文件 object,这可能与文件类型冲突; make sure you do not try to write a string to a binary stream and vice versa.确保不要尝试将字符串写入二进制 stream ,反之亦然。

So we just need to set the encoding parameter appropriately.所以我们只需要适当地设置encoding参数。 Writing as ASCII means that non-ASCII characters need to be entity-escaped ( &#25293; etc.) (It still writes to the file without a problem, of course, because the UTF-8 encoding specified for the file is ASCII-transparent.)写为 ASCII 意味着非 ASCII 字符需要进行实体转义( &#25293;等)(当然,它仍然可以毫无问题地写入文件,因为为文件指定的 UTF-8 编码是 ASCII 透明的。 )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过python将汉字写入文件 - How to write Chinese characters to file by python Python读取带有中文字符的xml文件 - Python read xml file with Chinese characters 从批处理文件启动时,Python脚本不会将希腊字符写入文本文件 - Python script doesn't write Greek characters to text file when started from batch file 如何将中文字符和英文字符同时写入文件(Python 3)? - How to write both Chinese characters and English characters into a file (Python 3)? Python CSV写入到Excel中不可读的文件(中文字符) - Python CSV write to file unreadable in Excel (Chinese characters) 作为systemd服务运行时,Python脚本不会写入文件 - Python script doesn't write to file when run as a systemd service 从 Dockerfile 执行的 Python 脚本不写入 HTML 文件 - Python script executed from Dockerfile doesn't write HTML file Python脚本创建文件但不写入任何内容 - Python script makes a file but doesn't write anything 如何阅读中文文本并将中文字符写入csv-Python 3 - How to read in Chinese text and write Chinese characters to csv - Python 3 当我尝试在批处理文件中使用python脚本写入文件然后运行它时。 它不会写入文件 - When I try to write to a file using python script in a batch file and then run it. It doesn't write to the file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM