简体   繁体   中英

extract fields from a text file of xml data

I have a text file I read in to SAS and I cleaned it to the point where each row contains something like this:

<xsd:element name="ReportingUnit" type="reportingunit:ReportingUnit_def" minOccurs="1" maxOccurs="1"/>

I need to extract the value of name and the value of type. So in this case I need to grab ReportingUnit and ReportingUnit_def

Any help would be much appreciated. Thanks

xsd is xml . Unfortunately, not xml that is formatted such that the xmlv2 engine will support it. If the xsd is clean as you state, flat input using @'character-string' pointer control will pull out the data you want.

Sample code

filename myxsd temp;

* example xsd from https://docs.microsoft.com/en-us/visualstudio/xml-tools/sample-xsd-file-simple-schema?view=vs-2015;
data _null_;
  file myxsd;
  input;
  put _infile_;
datalines;
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"   
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"   
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"   
           elementFormDefault="qualified">  
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>  
 <xsd:complexType name="PurchaseOrderType">  
  <xsd:sequence>  
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>  
   <xsd:element name="BillTo" type="tns:USAddress"/>  
  </xsd:sequence>  
  <xsd:attribute name="OrderDate" type="xsd:date"/>  
 </xsd:complexType>  

 <xsd:complexType name="USAddress">  
  <xsd:sequence>  
   <xsd:element name="name"   type="xsd:string"/>  
   <xsd:element name="street" type="xsd:string"/>  
   <xsd:element name="city"   type="xsd:string"/>  
   <xsd:element name="state"  type="xsd:string"/>  
   <xsd:element name="zip"    type="xsd:integer"/>  
  </xsd:sequence>  
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>  
 </xsd:complexType>  
</xsd:schema>  
run;

libname myxsd xmlv2;

proc copy in=myxsd out=work;
run;

data weak_parse;
  infile myxsd dsd dlm=" />" missover;
  length name type $100;
  input @"name=" name @"type=" type;
run;

A log ERROR will occur when proc copy attempts to read the xsd through libname. But the input statement sails along fine

536  libname myxsd xmlv2;
NOTE: Libref MYXSD was successfully assigned as follows:
      Engine:        XMLV2
      Physical Name: C:\Users\Richard\AppData\Local\Temp\SAS Temporary
      Files\_TD2764_HELIUM_\#LN00053
537
538  proc copy in=myxsd out=work;
539  run;

ERROR: XML data is not in a format supported natively by the XML libname engine. Files of this
       type may require an XMLMap to be input properly.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE COPY used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.
540


541  data weak_parse;
542    infile myxsd dsd dlm=" />" missover;
543    length name type $100;
544    input @"name=" name @"type=" type;
545  run;

NOTE: The infile MYXSD is:
      Filename=C:\Users\Richard\AppData\Local\Temp\SAS Temporary Files\_TD2764_HELIUM_\#LN00053,
      RECFM=V,LRECL=32767,File Size (bytes)=1968,
      Last Modified=21Sep2018:22:51:56,
      Create Time=21Sep2018:22:51:56

NOTE: 24 records were read from the infile MYXSD.
      The minimum record length was 80.
      The maximum record length was 80.
NOTE: The data set WORK.WEAK_PARSE has 24 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

The data read in will be

The SAS System

Obs    name                 type

  1
  2
  3
  4
  5    PurchaseOrder        tns:PurchaseOrderType
  6    PurchaseOrderType
  7
  8    ShipTo               tns:USAddress
  9    BillTo               tns:USAddress
 10
 11    OrderDate            xsd:date
 12
 13
 14    USAddress
 15
 16    name                 xsd:string
 17    street               xsd:string
 18    city                 xsd:string
 19    state                xsd:string
 20    zip                  xsd:integer
 21
 22    country              xsd:NMTOKEN
 23
 24

Had the same issue, kindly try this (it worked for me):

data want;
infile "M:\some\path\ihave\favorites.xml";
length line $100;
input;
line = _infile_;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM