Python-docx 忽略非 unicode 符号，如“大于或等于”

Question

阅读包含表格和文本的单词 docx 时

使用 python-docx 进入 python ，所有符号都会被删除。 这些符号都是使用正常的插入符号步骤创建的。 它说它来自字体符号，字符代码179 ，来自符号（十进制）

Python-docx 只是将其显示为 ''。 左侧的“加号或减号”符号相同。

从段落中读取文本（不是表格中的文本）时，我使用以下代码：

def listText():
   test = docx.Document('Problem.docx')
   testp=test.paragraphs[0] #The only paragraph in the test docx
   stringThatShouldHaveSymbol = testp.text

   print(stringThatShouldHaveSymbol)

   return stringThatShouldHaveSymbol

这仅从仅包含这些符号的文档中返回 ''。 如果它有符号，那么 10 它只会返回 10。

我也尝试了这种 xml 方法，但即使返回“”。

def get_accepted_text(p):
    """Return text of a paragraph after accepting all changes"""
    xml = p._p.xml
    if "w:del" in xml or "w:ins" in xml:
        tree = docx.Document.XML(xml)
        runs = (node.text for node in tree.getiterator(TEXT) if node.text)
        return "".join(runs)
    else:
        return p.text
for p in doc.paragraphs:
    print(p.text)
    print("---")
    print(get_accepted_text(p))
    print("=========")

如何从这些文档中提取数据？ 有没有办法以编程方式将这些符号（十进制）转换为 Unicode（十六进制）？

Answer 1

尝试这个

单击符号下拉和 select（普通文本）
现在 select 您的特殊符号

如果你现在阅读 docx 文件，你应该得到你的符号。

不知道为什么符号字体不起作用。 在 Arial 中，179 是一个 3 上标。

Python-docx 忽略非 unicode 符号，如“大于或等于”

问题描述

1 个解决方案

解决方案1
0 2021-05-11 17:11:39

Python-docx 忽略非 unicode 符号，如“大于或等于”

问题描述

1 个解决方案

解决方案1 0 2021-05-11 17:11:39

解决方案1
0 2021-05-11 17:11:39