简体   繁体   中英

How To Parse XML-Like File Streamed Text and Export into CSV

I have log files which appear to be file streamed text that contains c++ namespace artifacts (hinted by the double colons::) and XML some content embedded. I have loaded the log and displayed them into a browser application which separates the content from the unix timestamps as so as so:

1564002293071 INFO:  ToGroundMessageFilter::addSubscriptionAddress staged subscribe address [uxas.messages.uxnative.KillSer
1564002293073 INFO:  *** INITIALIZING:: Service[ToGroundMessageFilter] Service Id[64] with working directory [] *** 
1564002293082 INFO:  WorldviewTransformationService::configure Location offsets = (lat:0, lon:0, alt:0)
1564002311397 INFO:  WatchdogManagerService::<WaypointActual Series="TACE"><Waypoint><Waypoint Series="TACE"><RemediationId>1</RemediationId><StatePlatformId>58</StatePlatformId><LatLonAlt><LatLonAlt Series="TACE"><Altitude>366</Altitude><Latitude>34.97866</Latitude><Longitude>-117.85169</Longitude></LatLonAlt></LatLonAlt><Speed>0</Speed><Heading>0</Heading><Roll>0</Roll><Pitch>0</Pitch><Yaw>0</Yaw></Waypoint></Waypoint><SenderID>68</SenderID><ActualTime>0</ActualTime><PerceivedTime>0</PerceivedTime><SenderPlatformWorld>Constructive</SenderPlatformWorld><SenderPlatformType>Other</SenderPlatformType><Comment></Comment></WaypointActual>
1564002312386 INFO:  ProximityConstraintService::<WatchdogConstraintViolation Series="TACE"><ConstraintId>-2</ConstraintId><Latching>true</Latching><Priority>1</Priority><RequestedRemediationId>1</RequestedRemediationId><HasViolation>false</HasViolation><ConstraintName>Proximity</ConstraintName><SenderID>72</SenderID><ActualTime>1564002312385</ActualTime><PerceivedTime>1564002312385</PerceivedTime><SenderPlatformWorld>Live</SenderPlatformWorld><SenderPlatformType>Air</SenderPlatformType><Comment></Comment></WatchdogConstraintViolation>

Now the intended direction is to take this log and parse it out and save it as a csv using javascript. Unfortunately I am not entirely sure how to approach this problem. JS has XML object parsers. But how is this done if there the contents per line is not XML? Id like to have a column for the unix time stamp, namespace names, and other details (see sample table at the bottom).

In addition, I have the format of each event namespace. Here are examples of 2. These XML-like configurations are updated over time as the software is updated. I have about 24+ XML structured "services" defined like below. Is there a way to have the parser "load XML configs" depending on the service name?

WorldViewTransformationService

<AutonomyWaypointActual Series="TACE">
    <Waypoint>
        <Waypoint Series="TACE">
            <RemediationId></RemediationId>
            <StatePlatformId></StatePlatformId>
            <LatLonAlt>
                <LatLonAlt Series="TACE">
                    <Altitude></Altitude>
                    <Latitude></Latitude>
                    <Longitude></Longitude>
                </LatLonAlt>
            </LatLonAlt>
            <Speed></Speed>
            <Heading></Heading>
            <Roll></Roll>
            <Pitch></Pitch>
            <Yaw></Yaw>
        </Waypoint>
    </Waypoint>
    <SenderID></SenderID>
    <ActualTime></ActualTime>
    <PerceivedTime></PerceivedTime>
    <SenderPlatformWorld></SenderPlatformWorld>
    <SenderPlatformType></SenderPlatformType>
    <Comment></Comment>
</AutonomyWaypointActual>

ProximityConstraintService

<ProximityConstraint Series="TACE">
    <Radius></Radius>
    <OtherPlatformId></OtherPlatformId>
    <ConstraintId></ConstraintId>
    <PlatformId></PlatformId>
    <Latching></Latching>
    <Priority></Priority>
    <RequestedRemediationId></RequestedRemediationId>
    <ConstraintName></ConstraintName>
</ProximityConstraint>

Example of output to a CSV: (Note events such as ProximityConstraintService does not hold any altitude, pitch or yaw info.

unix           |  event                         | altitude | pitch | yaw | Priority | Latching
1564002293071    ToGroundMessageFilter               -         -      -       -          -
1564002293073    INITIALIZING                        -         -      -       -          -
1564002293082    WorldviewTransformationService     100        15     4       -          -
1564002300983    WorldviewTransformationService     220        16     2       -          -
1564002312386    ProximityConstraintService          -         -      -       3          1

There are multiple ways, how it could be done.

If you want to parse XML properly (which would be the right way to do it), I would probably do it the lazy way in two passes, and firstly convert the logfile to proper XML, then use standard XML parsing/querying tools to produce CSV from proper XML.

The devil is in the details, so, if this can't work for you, let us know, but I would probably:

1) grep the log to leave in only the lines I am interested in;

2) Wrap all log messages I am interested in, into some generic logentry XML, under some root tag so it would look like this:

<logfile>
  ...
  <logentry timestamp="1564002311397" level="INFO" source="WatchdogManagerService">
    <WaypointActual Series="TACE"><Waypoint>...</WaypointActual>
  </logentry>
  <logentry ...>...</logentry>
  ...
</logfile>

3) Parse the resulting XML into CSV as proper XML file with normal XSLT or another processor.

The only thing you can't do is processing those 24+ XML formats automagically, so that you get proper output in your CSV. For each different XML format, you might have to write a different XSLT template which applies to this type of logentry content. But then they can all be joined into one XSLT file, so you can apply the single XSLT to the whole of your logfile in a single go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM