I have the following types of text
1. DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe:
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
4. DIMENSIONS: | ORIGIN:
5. Review attribution | DIMENSIONS: | ORIGIN:
6. Pipe: | DIMENSIONS: | ORIGIN: 2010 PureData Survey
REQUIRED OUTPUT
1. Position corrected and IL (0) was changed based on RPS: 3482 -230
2. DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
3. DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
4.
5. Review attribution
6. ORIGIN: 2010 PureData Survey
Basically I want to get rid of any blank keys like Dimensions, Origin, Pipe etc.
I think we have to do this separately for each key...I would prefer this as there are lots more keys I need to use it for.
According to https://regex101.com/r/OX1W3b/6
(.*)DIMENSIONS: \|(.*)
works but I am not sure how to use it in python
import re
str='DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230'
x=re.sub(".*DIMENSIONS.*","(.*)DIMENSIONS: \|(.*)",str)
print(x)
Results in just a repeat of the 2nd value in re.sub as it is expecting a string and not a regex function.
In Google Sheets I would use =REGEXEXTRACT(A1,"(.*)DIMENSIONS: \\|(.*)")
Is there something similar in python? Re.sub needs the value to replace with but I am getting this from the regex capture groups.
Note this is similar to my question in gis se - as it's more of a python question than a gis question.
I'd say just split each line on |
into separate fields, check if there's no value, and then rejoin on |
:
s = '''DIMENSIONS: | ORIGIN: | Position corrected and IL (0) was changed based on RPS: 3482 -230 | Pipe:
DIMENSIONS: 2 x 1350 RCP | ORIGIN: PCD13180 | Position corrected and IL (0) was changed based on RPS: 1390 -20800/1350RCP
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: | ORIGIN:
Review attribution | DIMENSIONS: | ORIGIN:
Pipe: | DIMENSIONS: | ORIGIN: 2010 PureData Survey'''.splitlines()
result = []
for line in s:
line = line.split('|')
lst = []
for field in line:
if not field.strip().endswith(':'):
lst.append(field)
result.append('|'.join(lst).strip())
Or, in one line:
result = ['|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip() for line in s]
Note that this gives you a list of lines. You can rejoin them with '\\n'.join(result)
if necessary.
This is the part that parses each line:
'|'.join([field for field in line.split('|') if not field.strip().endswith(':')]).strip()
For example, if line
is DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm | ORIGIN:
, that gives us this:
DIMENSIONS: 3 x 375 RCP | Pipe: 35mm
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.