Python-docx 忽略非 unicode 符號，如“大於或等於”

Question

閱讀包含表格和文本的單詞 docx 時

使用 python-docx 進入 python ，所有符號都會被刪除。 這些符號都是使用正常的插入符號步驟創建的。 它說它來自字體符號，字符代碼179 ，來自符號（十進制）

Python-docx 只是將其顯示為 ''。 左側的“加號或減號”符號相同。

從段落中讀取文本（不是表格中的文本）時，我使用以下代碼：

def listText():
   test = docx.Document('Problem.docx')
   testp=test.paragraphs[0] #The only paragraph in the test docx
   stringThatShouldHaveSymbol = testp.text

   print(stringThatShouldHaveSymbol)

   return stringThatShouldHaveSymbol

這僅從僅包含這些符號的文檔中返回 ''。 如果它有符號，那么 10 它只會返回 10。

我也嘗試了這種 xml 方法，但即使返回“”。

def get_accepted_text(p):
    """Return text of a paragraph after accepting all changes"""
    xml = p._p.xml
    if "w:del" in xml or "w:ins" in xml:
        tree = docx.Document.XML(xml)
        runs = (node.text for node in tree.getiterator(TEXT) if node.text)
        return "".join(runs)
    else:
        return p.text
for p in doc.paragraphs:
    print(p.text)
    print("---")
    print(get_accepted_text(p))
    print("=========")

如何從這些文檔中提取數據？ 有沒有辦法以編程方式將這些符號（十進制）轉換為 Unicode（十六進制）？

Answer 1

嘗試這個

單擊符號下拉和 select（普通文本）
現在 select 您的特殊符號

如果你現在閱讀 docx 文件，你應該得到你的符號。

不知道為什么符號字體不起作用。 在 Arial 中，179 是一個 3 上標。

Python-docx 忽略非 unicode 符號，如“大於或等於”

問題描述

1 個解決方案

解決方案1
0 2021-05-11 17:11:39

Python-docx 忽略非 unicode 符號，如“大於或等於”

問題描述

1 個解決方案

解決方案1 0 2021-05-11 17:11:39

解決方案1
0 2021-05-11 17:11:39