Working with a project where i want to search and replace a specific string in word DOTM file. However searching within DOTM files i got to work with docx2python but replacing the searched word is still a headache. Can replacing be done in DOTM files?
Paragraphs in a docx file are made of text runs
. MS Word will break up text runs arbitrarily, often in the middle of a word.
<w:r>
<w:t>work to im</w:t>
</w:r>
<w:r>
<w:t>prove docx2python</w:t>
</w:r>
These breaks are due to style differences, version differences, spell-check state, etc. This makes things like algorithmic search-and-replace problematic. I often use docx templates with placeholders (eg, #CATEGORY_NAME#
) then replace those placeholders with data. This won't work if your placeholders are broken up (eg, #CAT
, E
, GORY_NAME#
).
Docx2python v2 merges such runs in the XML as a pre-processing step. Specifically, Docx2Python merges runs with identical formatting AS DOCX2PYTHON SEES FORMATTING, that is, Docx2Python will ignore version data, spell-check state, etc. but respect supported formatting elements like bold, italics, font-size, etc.
With argument html=False
, Docx2Python will merge nearly all runs (some like links are kept separate intentionally) to make most paragraphs one run.
These examples should make everything clear. Check out replace_docx_text
and other functions in the Docx2Python utilities.py
module.
from docx2python.main import docx2python
from docx2python.utilities import get_links, replace_docx_text, get_headings
class TestSearchReplace:
def test_search_and_replace(self) -> None:
"""Apples -> Pears, Pears -> Apples
Ignore html differences when html is False"""
html = False
input_filename = "apples_and_pears.docx"
output_filename = "pears_and_apples.docx"
assert docx2python(input_filename, html=html).text == (
"Apples and Pears\n\nPears and Apples\n\n"
"Apples and Pears\n\nPears and Apples"
)
replace_docx_text(
input_filename,
output_filename,
("Apples", "Bananas"),
("Pears", "Apples"),
("Bananas", "Pears"),
html=html,
)
assert docx2python(output_filename, html=html).text == (
"Pears and Apples\n\nApples and Pears\n\n"
"Pears and Apples\n\nApples and Pears"
)
def test_search_and_replace_html(self) -> None:
"""Apples -> Pears, Pears -> Apples
Exchange strings when formatting is consistent across the string. Leave
alone otherwise.
"""
html = True
input_filename = "apples_and_pears.docx"
output_filename = "pears_and_apples.docx"
assert docx2python(input_filename, html=html).text == (
"Apples and Pears\n\n"
"Pears and Apples\n\n"
'Apples and <span style="background-color:green">Pears</span>\n\n'
"Pe<b>a</b>rs and Apples"
)
replace_docx_text(
input_filename,
output_filename,
("Apples", "Bananas"),
("Pears", "Apples"),
("Bananas", "Pears"),
html=html,
)
assert docx2python(output_filename, html=html).text == (
"Pears and Apples\n\n"
"Apples and Pears\n\n"
'Pears and <span style="background-color:green">Apples</span>\n\n'
"Pe<b>a</b>rs and Pears"
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.