简体   繁体   中英

python conditional text search

I am a complete novice to python programming and have been using it to try and improve some of the tedious tasks that I have at work. One such task is taking a particular format of report that is output by a software, and translating it to a format which is readable by another software for further processing.

So far, I have done fairly well by using what I've been able to find in researching here on stack overflow and various other resources. But I've come up against one which I'm not having a lot of luck cracking and was hoping for some advice or a pointer in the right direction.

My original data is something like this:

BR6.FLD T: Tue Nov 07 15:22:25 2017


// 
   // Generated by 12dField - Setout
   // 11.0C1m
   // Surveyor: gm
   Coordinate:  Name: CH9583R TT X: 414638.4070 Y: 827823.6220 Z: 88.0290
   Station:  Name: CH9583R TT Ht: 1.4240
   Target Height:  0.4000
   Target Height:  0.4000
   PPM Correction:  O: 0.00000000
   Measurement:  H:   20° 24' 28" V:   92° 44'  9" S: 115.9559
   Attribute Set: Attribute Set Start:  N:12D Field
   Attribute Set: Attribute Set Start:  N:Basic Pickup
   Attribute: Real Attribute for Vertex:  N:so_cs_raw_3d_ch V:0.0000000000000000
   Attribute Set: Attribute Set End:  N:Basic Pickup
   Attribute Set: Attribute Set Start:  N:Product Details
   Attribute: Integer Attribute for Vertex:  N:12d_product_version V:11
   Attribute: Integer Attribute for Vertex:  N:12d_major_version V:1
   Attribute: Integer Attribute for Vertex:  N:12d_minor_version V:13
   Attribute: Integer Attribute for Vertex:  N:12d_build_version V:6
   Attribute: Integer Attribute for Vertex:  N:version V:23
   Attribute Set: Attribute Set End:  N:Product Details
   Attribute Set: Attribute Set Start:  N:Inst Stat Setup
   Attribute: Real Attribute for Vertex:  N:is_x V:414638.4070000000100000
   Attribute: Real Attribute for Vertex:  N:is_y V:827823.6219999999700000
   Attribute: Real Attribute for Vertex:  N:is_z V:88.0289999999999960
   Attribute: Real Attribute for Vertex:  N:is_hi V:1.4239999999999999
   Attribute: Real Attribute for Vertex:  N:is_bearing_swing V:2.1483160800061616

...which continues for a long time depending on the number of observations made in the field.

Through a series of list comprehensions I've weeded through this to output a more friendly file that looks like this:

Station:


CH9583R TT Ht: 1.4240

Measurement:
  H:   20-24-28 V:   92-44- 9 S: 115.9559
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP1

Measurement:
  H:   17-49-10 V:   91- 8-14 S: 172.6005
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP1

Measurement:
  H:   48-48-29 V:   91-10-11 S: 167.7516
   Prism Constant:0.0175000000000000
   Target height:0.4000000000000000
   Name:CP3

The next step is that I want to convert that into a json object so that I can access the properties in some code to output the final form.

Currently what I'm able to output is this:

{
"Stations":[

{ "Station":" CH9583R TT " , "Ht": 1.4240

,"Measurements": [ {
 "H":  "20-24-28"  ,"V":  "92-44-09"   ,"S":" 115.9559" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1"} 

{
 "H":  "17-49-10"  ,"V":  "91-08-14"   ,"S":" 172.6005" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1"} 

{
 "H":  "48-48-29"  ,"V":  "91-10-11"   ,"S":" 167.7516" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP3"} 

{ "Station":" CH9504L TT " , "Ht": 1.4110

,"Measurements": [ {
 "H":  "307-01-10"  ,"V":  "90-02-25"   ,"S":" 120.6765" 
  ,"Prism_Constant":"0.0175000000000000" 
  ,"Target_Height":"0.4000000000000000" 
  ,"Name":"CP1A"} 

{

Which is not quite right for reading back in as json. My main problem is I'm unsure of how to approach the problem of searching the string for my insertion points. I want to say something like:

if a_sequence_of_characters is_followed_by(another_sequence):
    insert(',',location)

And use that to finish out formatting the data.

Sorry for the length of post. Any suggestions are welcome and thank you in advance for your help.

I might understand enough about your data to hazard some guesses about how the .fld file can be transformed into the .dat file. If I'm anywhere near right about this, I wonder if it might be easier to go directly from the former to the latter.

Here's what I have so far.

import re

coords_line = re.compile(
    r'Coordinate:\s+Name:\s+(?P<name>[a-z0-9]+)[^X]+X:\s+(?P<X>[0-9.]+)[^Y]+Y:\s+(?P<Y>[0-9.]+)[^Z]+Z:\s+(?P<Z>[0-9.]+)', re.I)
measurement_line = re.compile(
    r'''Measurement:\s+H:\s+(?P<h_degrees>[0-9]+).\s+(?P<h_minutes>[0-9]+)'\s+(?P<h_seconds>[0-9]+)"\s+V:\s+(?P<v_degrees>[0-9]+).\s+(?P<v_minutes>[0-9]+)'\s+(?P<v_seconds>[0-9]+)"\s+S:\s+(?P<S>[0-9.]+)''')
attributes_line = re.compile(
    r'''Attribute: Text Attribute for Vertex:\s+N:store_pt_string_name\s+V:(?P<attribute>[a-z0-9]+)''', re.I)

with open('greg_out.txt', 'w') as greg_out:
    coords_info = []
    with open('greg_in.txt') as greg:
        for line in greg:
            m = coords_line.search(line)
            if m:
                if not m.group('name') in coords_info:
                    coords_info.append(m.group('name'))
                    print ('C', m.group('name'), m.group('X'), m.group('Y'), m.group('Z'), file=greg_out)
        print (file=greg_out)

    current_coords = None
    with open('greg_in.txt') as greg:
        for line in greg:
            m = coords_line.search(line)
            if m:
                if current_coords :
                    print ('DE\n', file=greg_out)
                current_coords = m.group('name')
                print ('DB', m.group('name'), file=greg_out)
            m = measurement_line.search(line)
            if m:
                recent_horizontal = (m.group('h_degrees'), m.group('h_minutes'), m.group('h_seconds'), m.group('S'), m.group('v_degrees'), m.group('v_minutes'), m.group('v_seconds'))
            m = attributes_line.search (line)
            if m:
                attribute = m.group('attribute')
                if attribute[0] in '0123456789':
                    attribute = 'CH' + attribute
                print ('DM', attribute, '{}-{}-{} {} {}-{}-{}'.format(*recent_horizontal), file=greg_out)

        print ('DE\n', file=greg_out)

This is what it produces.

C CH9583R 414638.4070 827823.6220 88.0290
C CH9504L 414775.1470 827859.5190 82.5870
C CH9360R 414672.4040 828056.2440 87.2310
C CP2 414691.2159 827987.9097 85.6298

DB CH9583R
DM CP1 20-24-28 115.9559 92-44-9
DM CP1 17-49-10 172.6005 91-8-14
DM CP3 48-48-29 167.7516 91-10-11
DE

DB CH9504L
DM CP1A 307-1-10 120.6765 90-2-25
DM CP2A 326-49-38 153.4059 89-7-51
DM CP3A 351-57-33 75.3264 88-27-17
DM BS2 255-17-27 141.4947 87-47-33
DM CP1B 307-1-13 120.6767 90-2-26
DM BS2B 255-17-27 141.4771 87-47-36
DM CP2B 326-49-43 153.4090 89-7-52
DM CP3B 351-57-34 75.3262 88-27-17
DM BS2 255-17-27 141.4769 87-47-34
DM CP1 307-1-15 120.6772 90-2-26
DM CP2 326-49-43 153.4065 89-7-50
DM CP3 351-57-35 75.3266 88-27-17
DM BS2 255-17-26 141.4769 87-47-35
DE

DB CH9583R
DM CP1 20-24-31 115.9544 92-44-5
DM BS 75-17-25 141.4892 92-9-14
DM CP2 17-49-11 172.5993 91-8-12
DM CP3 48-48-30 167.7499 91-10-7
DM BS1 75-17-25 141.4715 92-9-14
DM BS1 75-17-25 141.4716 92-9-13
DM CP1 20-24-30 115.9553 92-44-8
DM CP2 17-49-11 172.6006 91-8-11
DM CP3 48-48-31 167.7485 91-10-8
DM BS1 75-17-27 141.4711 92-9-13
DM CP1 20-24-32 115.9559 92-44-5
DM CP2 17-49-10 172.6002 91-8-13
DM CP3 48-48-34 167.7505 91-10-8
DM BS1 75-17-29 141.4711 92-9-13
DE

DB CH9360R
DM CP1 177-2-58 124.3272 92-14-30
DM BS2 188-18-50 235.0917 89-51-44
DM CP2 164-36-28 70.9176 91-58-36
DM BS2 188-18-44 235.0915 89-51-39
DM CP1 177-2-54 124.3264 92-14-30
DM CP2 164-36-23 70.9163 91-58-33
DM BS2 188-18-42 235.0917 89-51-40
DM CP1 177-2-54 124.3240 92-14-30
DM CP2 164-36-31 70.9200 91-58-38
DE

DB CP2
DM CH9360 344-36-29 70.8914 88-45-37
DM CH9583R 197-49-11 172.5688 89-36-21
DM CP1 192-33-48 57.1951 93-19-41
DM CH9504L 146-49-35 153.4204 91-9-37
DM CP3 126-15-34 91.0306 90-45-37
DM CH9360R 344-36-27 70.8914 88-45-37
DM CH9583R 197-49-10 172.5682 89-36-23
DM CH9504L 146-49-37 153.4203 91-9-38
DM C3 126-15-31 91.0292 90-45-37
DM CH9360R 344-36-27 70.8913 88-45-39
DM CH9583R 197-49-4 172.5685 89-36-22
DM CP1 192-33-45 57.1953 93-19-40
DM CH9504L 146-49-30 153.4206 91-9-37
DM CP3 126-15-26 91.0301 90-45-38
DE

Some items are clearly missing, in most cases because I don't know how they're calculated and don't want to invest effort for nothing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM