Python：文件写入和从 dict 迭代 re.sub 仅写入最后一次出现

Question

我无法弄清楚如何从字典中编写和保存所有 re.sub 迭代。 只有最后一次出现被保存到文件中。 我有一个翻译工作表。csv 格式为：

locale, lang, Foo-v2, Bar-v2
de_DE, German, German-Foo-v2, German-Bar-v2
zh_CN, Chinese, 零件-Foo-v2, 零件-Bar-v2

每种语言都有一个文件夹：target/de_DE_v123.xml

一个文件的内容：

<trans-unit id="14_de_DE" resname="documentGroup.translation">
            <source xml:lang="en-GB">Foo-v2</source>
            <target xml:lang="de-DE">German-Foo-v1</target>
         </trans-unit>      
         <trans-unit id="1759_de_DE" resname="documentGroup.translation">
            <source xml:lang="en-GB">Bar-v2</source>
            <target xml:lang="de-DE">German-Bar-v1</target>
</trans-unit>

目标是将 go 放入每个翻译文件并更新所有目标文本。 必须使用正则表达式，因为无论当前是什么，都必须覆盖目标翻译文本。

import glob
import pandas as pd
import re

data = pd.read_csv('translate-worksheet.csv', sep=',', header=0)
englishTranslation = data.columns[2:] #get English text
for k, v in data.iterrows():
    locale = v[0]
    docGroup = v[2:]
    findnreplace = dict(zip(englishTranslation,docGroup)) #{english source: translated target}
    print("Working on language:"+locale)
    for propFile in glob.glob('target\\*'+locale+'*.xml'):
        print("  xliff file:"+propFile)
        with open(propFile, 'r+', encoding='utf-8') as f:
            content = f.read()
            for source, target in findnreplace.items():
                print("   Replacing:"+source+", with:"+target)
                match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE)
                f.seek(0)
                f.write(match)
            print(match)

预期output：

<trans-unit id="14_de_DE" resname="documentGroup.translation">
   <source xml:lang="en-GB">Foo-v2</source>
   <target xml:lang="de-DE">German-Foo-v2</target>
</trans-unit>
<trans-unit id="1759_de_DE" resname="documentGroup.translation">
   <source xml:lang="en-GB">Bar-v2</source>           
   <target xml:lang="de-DE">German-Bar-v2</target>
 </trans-unit>

实际output：

<trans-unit id="14_de_DE" resname="documentGroup.translation">
   <source xml:lang="en-GB">Foo-v2</source>
   <target xml:lang="de-DE">German-Foo-v1</target>
</trans-unit>
<trans-unit id="1759_de_DE" resname="documentGroup.translation">
    <source xml:lang="en-GB">Bar-v2</source>            <target xml:lang="de-DE">German-Bar-v2</target>
 </trans-unit>

我是 Python 的新手，欢迎所有批评以改进整体代码。

更新解决方案：这可能是非常低效的代码，因为它每次都会打开文件，修改它，然后关闭它，但它可以工作，我的文件每个只有 15kb。 我将它从“打开文件并为字典中的每个源和目标做某事”更改为“为字典中的每个源和目标，打开文件并做一些事情。

for propFile in glob.glob('target\\*'+locale+'*.xml'):
        print("  xliff file:"+propFile)
        for source, target in findnreplace.items():
            with open(propFile, 'r+', encoding='utf-8') as f:
                content = f.read()
                f.seek(0)
                print("   Replacing:"+source+", with:"+target)
                match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE)
                f.write(match)
                f.truncate()
        print(match)

Answer 1

根据您的代码，您似乎想使用正则表达式替换现有文本文件中的文本块。 为此，基本逻辑是：

找到要替换的文本
在此文本之前存储现有文件文本
在此文本之后存储现有文件文本
创建要在更新文件中使用的替换文本
用“之前”文本、替换文本和“之后”文本重写文件

如果没有您的实际数据，我无法确认此更新的代码是否有效，但应该很接近：

for source, target in findnreplace.items():
    print("   Replacing:"+source+", with:"+target)
    # find start\end index of text to be replaced
    srch = re.search(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE)
    startidx, endidx = .span() # position of text within file
    # get replacement text                
    match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE)
    f.seek(0) # from file start
    preblk = f.read(startidx) # all text before replace block
    f.seek(endidx) # end of replace block
    postblk = f.read(endidx)  # all text after replace block
    f.seek(0)  # restart from beginning
    f.truncate(0)  # clear file contents
    f.write(preblk)
    f.write(match)
    f.write(postblk)
print(match)

Answer 2

这可能是非常低效的代码，因为它每次都会打开文件、修改它并关闭它，但它可以工作，而且我的文件每个只有 15kb。 我将它从“打开文件并为 dict 中的每个源和目标，做某事”更改为“为 dic 中的每个源和目标，打开文件并做某事。

for propFile in glob.glob('target\\*'+locale+'*.xml'):
        print("  xliff file:"+propFile)
        for source, target in findnreplace.items():
            with open(propFile, 'r+', encoding='utf-8') as f:
                content = f.read()
                f.seek(0)
                print("   Replacing:"+source+", with:"+target)
                match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE)
                f.write(match)
                f.truncate()
        print(match)

Python：文件写入和从 dict 迭代 re.sub 仅写入最后一次出现

问题描述

2 个解决方案

解决方案1
0 2020-07-30 16:18:30

解决方案2
0 已采纳 2020-07-30 19:40:54

Python：文件写入和从 dict 迭代 re.sub 仅写入最后一次出现

问题描述

2 个解决方案

解决方案1 0 2020-07-30 16:18:30

解决方案2 0 已采纳 2020-07-30 19:40:54

解决方案1
0 2020-07-30 16:18:30

解决方案2
0 已采纳 2020-07-30 19:40:54