Python 3中的re.sub

Question

我有以下几种文字

1. DIMENSIONS:  | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe: 
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
4. DIMENSIONS:  | ORIGIN:
5. Review attribution | DIMENSIONS:  | ORIGIN:
6. Pipe: | DIMENSIONS:  | ORIGIN: 2010 PureData Survey

所需的输出

1. Position corrected and IL (0) was changed based on RPS: 3482 -230
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
4. 
5. Review attribution
6. ORIGIN: 2010 PureData Survey

基本上我想摆脱任何空白键，例如“尺寸”，“原点”，“管道”等。

我认为我们必须为每个键分别执行此操作...我希望这样做，因为还有更多的键需要使用。

根据https://regex101.com/r/OX1W3b/6

(.*)DIMENSIONS:  \|(.*)

可以，但是我不确定如何在python中使用它

import re
str='DIMENSIONS:  | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230'
x=re.sub(".*DIMENSIONS.*","(.*)DIMENSIONS:  \|(.*)",str)
print(x)

结果只是re.sub中第二个值的重复，因为它期望的是字符串而不是正则表达式函数。

在Google表格中，我将使用=REGEXEXTRACT(A1,"(.*)DIMENSIONS: \\|(.*)")

python中是否有类似的东西？ Re.sub需要替换的值，但我是从正则表达式捕获组中获取的。

请注意，这与我在gis se中的问题类似-因为它更像是python问题而不是gis问题。

Answer 1

我想说的只是将每一行分开| 进入单独的字段，检查是否没有值，然后在|重新加入 ：

s = '''DIMENSIONS:  | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe: 
DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS:  | ORIGIN:
Review attribution | DIMENSIONS:  | ORIGIN:
Pipe: | DIMENSIONS:  | ORIGIN: 2010 PureData Survey'''.splitlines()

result = []
for line in s:
    line = line.split('|')
    lst = []
    for field in line:
        if not field.strip().endswith(':'):
            lst.append(field)
    result.append('|'.join(lst).strip())

或者，一行：

result = ['|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip() for line in s]

请注意，这为您提供了行列表。 您可以根据需要使用'\\n'.join(result)重新加入它们。

这是解析每一行的部分：

'|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip()

DIMENSIONS: 3 x 375 RCP | Pipe: 35mm

Python 3中的re.sub

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-04-16 02:25:33

Python 3中的re.sub

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-04-16 02:25:33

解决方案1
2 已采纳 2019-04-16 02:25:33