如何使用python搜索和替换DOTM文件中的字符串

Question

Working with a project where i want to search and replace a specific string in word DOTM file.使用我想在 word DOTM 文件中搜索和替换特定字符串的项目。 However searching within DOTM files i got to work with docx2python but replacing the searched word is still a headache.然而，在 DOTM 文件中搜索我必须使用 docx2python 但替换搜索到的词仍然令人头疼。 Can replacing be done in DOTM files?可以在 DOTM 文件中进行替换吗？

Answer 1

Paragraphs in a docx file are made of text runs . docx 文件中的段落由文本runs 。 MS Word will break up text runs arbitrarily, often in the middle of a word. MS Word 将任意拆分文本运行，通常在一个单词的中间。

<w:r>
    <w:t>work to im</w:t>
</w:r>
<w:r>
    <w:t>prove docx2python</w:t>
</w:r>

These breaks are due to style differences, version differences, spell-check state, etc. This makes things like algorithmic search-and-replace problematic.这些中断是由于样式差异、版本差异、拼写检查状态等造成的。这使得算法搜索和替换等问题成为问题。 I often use docx templates with placeholders (eg, #CATEGORY_NAME# ) then replace those placeholders with data.我经常使用带有占位符（例如#CATEGORY_NAME# ）的 docx 模板，然后用数据替换这些占位符。 This won't work if your placeholders are broken up (eg, #CAT , E , GORY_NAME# ).如果您的占位符被分解（例如， #CAT 、 E 、 GORY_NAME# ），这将不起作用。

Docx2python v2 merges such runs in the XML as a pre-processing step. Docx2python v2 将 XML 中的此类运行合并为预处理步骤。 Specifically, Docx2Python merges runs with identical formatting AS DOCX2PYTHON SEES FORMATTING, that is, Docx2Python will ignore version data, spell-check state, etc. but respect supported formatting elements like bold, italics, font-size, etc.具体来说，Docx2Python 合并运行与 DOCX2PYTHON 看到的格式相同的格式，也就是说，Docx2Python 将忽略版本数据、拼写检查状态等，但尊重支持的格式元素，如粗体、斜体、字体大小等。

With argument html=False , Docx2Python will merge nearly all runs (some like links are kept separate intentionally) to make most paragraphs one run.使用参数html=False ，Docx2Python 将合并几乎所有运行（有些像链接故意分开）以使大多数段落一次运行。

These examples should make everything clear.这些例子应该让一切都清楚。 Check out replace_docx_text and other functions in the Docx2Python utilities.py module.查看 Docx2Python utilities.py .py 模块中的replace_docx_text和其他函数。

from docx2python.main import docx2python
from docx2python.utilities import get_links, replace_docx_text, get_headings


class TestSearchReplace:
    def test_search_and_replace(self) -> None:
        """Apples -> Pears, Pears -> Apples

        Ignore html differences when html is False"""
        html = False
        input_filename = "apples_and_pears.docx"
        output_filename = "pears_and_apples.docx"
        assert docx2python(input_filename, html=html).text == (
            "Apples and Pears\n\nPears and Apples\n\n"
            "Apples and Pears\n\nPears and Apples"
        )
        replace_docx_text(
            input_filename,
            output_filename,
            ("Apples", "Bananas"),
            ("Pears", "Apples"),
            ("Bananas", "Pears"),
            html=html,
        )
        assert docx2python(output_filename, html=html).text == (
            "Pears and Apples\n\nApples and Pears\n\n"
            "Pears and Apples\n\nApples and Pears"
        )

    def test_search_and_replace_html(self) -> None:
        """Apples -> Pears, Pears -> Apples

        Exchange strings when formatting is consistent across the string. Leave
        alone otherwise.
        """
        html = True
        input_filename = "apples_and_pears.docx"
        output_filename = "pears_and_apples.docx"
        assert docx2python(input_filename, html=html).text == (
            "Apples and Pears\n\n"
            "Pears and Apples\n\n"
            'Apples and <span style="background-color:green">Pears</span>\n\n'
            "Pe<b>a</b>rs and Apples"
        )
        replace_docx_text(
            input_filename,
            output_filename,
            ("Apples", "Bananas"),
            ("Pears", "Apples"),
            ("Bananas", "Pears"),
            html=html,
        )
        assert docx2python(output_filename, html=html).text == (
            "Pears and Apples\n\n"
            "Apples and Pears\n\n"
            'Pears and <span style="background-color:green">Apples</span>\n\n'
            "Pe<b>a</b>rs and Pears"
        )

如何使用python搜索和替换DOTM文件中的字符串

问题描述

1 个解决方案

解决方案1
0 2021-12-22 13:52:39

如何使用python搜索和替换DOTM文件中的字符串

问题描述

1 个解决方案

解决方案1 0 2021-12-22 13:52:39

解决方案1
0 2021-12-22 13:52:39