简体   繁体   English

使用 pyuno 在 LibreOffice writer 文档中搜索正则表达式非常贪婪

[英]Regular expressions search in LibreOffice writer documents using pyuno extremely greedy

I have a LibreOffice writer document that contains text snippets of the form prefix<...> .我有一个 LibreOffice writer 文档,其中包含格式prefix<...>的文本片段。 In writer I can easily locate them with search for regular expressions:在 writer 中,我可以通过搜索正则表达式轻松找到它们:

在此处输入图像描述

Now I would like to make a python list of all these occurrences using pyuno in a standalone python script from outside LibreOffice.现在,我想在 LibreOffice 外部的独立 python 脚本中使用 pyuno 制作所有这些事件的 python 列表。

The code that I have collected from a variety of sources looks like this and seems to work so far:我从各种来源收集的代码看起来像这样,到目前为止似乎可以工作:

import uno, os, time

SOCKET = 'socket,host=localhost,port=2002;urp;'
file = '/home/jochen/Dokumente/regexp_find_test.odt'
office_proc = os.popen('/usr/lib/libreoffice/program/soffice ' + file + ' --accept="' + SOCKET + 'StarOffice.ServiceManager"')
time.sleep(3)

localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)

try:
    context = resolver.resolve('uno:' + SOCKET + 'StarOffice.ComponentContext')
except:
    raise Exception("failed to connect to LibreOffice.org with socket {}".format(SOCKET))
loffice_desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
comp = loffice_desktop.getCurrentComponent()
search_descr = comp.createSearchDescriptor()
search_descr.SearchRegularExpression = True
search_descr.setSearchString('prefix<[a-z_]+>')
res = comp.findAll(search_descr)
print(len(res))
for n in range(len(res)):
    print(40*'-')
    print(res[n].Text.getText().getString())

The output that I am getting surprises me, since I use the same expression as in writer:我得到的 output 让我感到惊讶,因为我使用了与 writer 相同的表达式:

12
----------------------------------------
prefix<vorname> prefix<name>
prefix<ort> prefix<strasse> prefix<haus_nummer>

Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. prefix<name> Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in prefix<ort> culpa qui officia deserunt mollit anim id est laborum.

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis prefix<vorname> dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo prefix<vorname> consequat. Duis autem vel prefix<name> eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

prefix<name> prefix<unterschrift>
----------------------------------------
prefix<vorname> prefix<name>
prefix<ort> prefix<strasse> prefix<haus_nummer>

I expected something nice like我期待一些不错的东西

12
----------------------------------------
prefix<vorname>
----------------------------------------
prefix<name>
----------------------------------------
prefix<ort>
[...]

Obviously the expression behaves extremely greedy, are there any suggestions to overcome this, or am I doing something completely wrong?显然这个表达式表现得非常贪婪,有什么建议可以克服这个问题,还是我做错了什么?

It's not greediness but simple wrong processing of the search results.不是贪心,而是对搜索结果的简单错误处理。

The line线

print(res[n].Text.getText().getString())

must change to必须改为

print(res[n].String

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM