[英]re.sub in Python 3
我有以下几种文字
1. DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe:
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
4. DIMENSIONS: | ORIGIN:
5. Review attribution | DIMENSIONS: | ORIGIN:
6. Pipe: | DIMENSIONS: | ORIGIN: 2010 PureData Survey
所需的输出
1. Position corrected and IL (0) was changed based on RPS: 3482 -230
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
4.
5. Review attribution
6. ORIGIN: 2010 PureData Survey
基本上我想摆脱任何空白键,例如“尺寸”,“原点”,“管道”等。
我认为我们必须为每个键分别执行此操作...我希望这样做,因为还有更多的键需要使用。
根据https://regex101.com/r/OX1W3b/6
(.*)DIMENSIONS: \|(.*)
可以,但是我不确定如何在python中使用它
import re
str='DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230'
x=re.sub(".*DIMENSIONS.*","(.*)DIMENSIONS: \|(.*)",str)
print(x)
结果只是re.sub中第二个值的重复,因为它期望的是字符串而不是正则表达式函数。
在Google表格中,我将使用=REGEXEXTRACT(A1,"(.*)DIMENSIONS: \\|(.*)")
python中是否有类似的东西? Re.sub需要替换的值,但我是从正则表达式捕获组中获取的。
请注意,这与我在gis se中的问题类似-因为它更像是python问题而不是gis问题。
我想说的只是将每一行分开|
进入单独的字段,检查是否没有值,然后在|
重新加入 :
s = '''DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe:
DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: | ORIGIN:
Review attribution | DIMENSIONS: | ORIGIN:
Pipe: | DIMENSIONS: | ORIGIN: 2010 PureData Survey'''.splitlines()
result = []
for line in s:
line = line.split('|')
lst = []
for field in line:
if not field.strip().endswith(':'):
lst.append(field)
result.append('|'.join(lst).strip())
或者,一行:
result = ['|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip() for line in s]
请注意,这为您提供了行列表。 您可以根据需要使用'\\n'.join(result)
重新加入它们。
这是解析每一行的部分:
'|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip()
例如,如果line
是DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
,这给了我们:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.