简体   繁体   English

从 xml 数据的文本文件中提取字段

[英]extract fields from a text file of xml data

I have a text file I read in to SAS and I cleaned it to the point where each row contains something like this:我有一个文本文件,我读入 SAS 并将其清理到每行包含如下内容的地步:

<xsd:element name="ReportingUnit" type="reportingunit:ReportingUnit_def" minOccurs="1" maxOccurs="1"/>

I need to extract the value of name and the value of type.我需要提取 name 的值和 type 的值。 So in this case I need to grab ReportingUnit and ReportingUnit_def所以在这种情况下,我需要获取ReportingUnitReportingUnit_def

Any help would be much appreciated.任何帮助将非常感激。 Thanks谢谢

xsd is xml . xsdxml Unfortunately, not xml that is formatted such that the xmlv2 engine will support it.不幸的是,不是 xml 格式设置为xmlv2引擎将支持它。 If the xsd is clean as you state, flat input using @'character-string' pointer control will pull out the data you want.如果xsd如您所说是干净的,则使用@'character-string'指针控件的平面input将拉出您想要的数据。

Sample code示例代码

filename myxsd temp;

* example xsd from https://docs.microsoft.com/en-us/visualstudio/xml-tools/sample-xsd-file-simple-schema?view=vs-2015;
data _null_;
  file myxsd;
  input;
  put _infile_;
datalines;
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"   
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"   
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"   
           elementFormDefault="qualified">  
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>  
 <xsd:complexType name="PurchaseOrderType">  
  <xsd:sequence>  
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>  
   <xsd:element name="BillTo" type="tns:USAddress"/>  
  </xsd:sequence>  
  <xsd:attribute name="OrderDate" type="xsd:date"/>  
 </xsd:complexType>  

 <xsd:complexType name="USAddress">  
  <xsd:sequence>  
   <xsd:element name="name"   type="xsd:string"/>  
   <xsd:element name="street" type="xsd:string"/>  
   <xsd:element name="city"   type="xsd:string"/>  
   <xsd:element name="state"  type="xsd:string"/>  
   <xsd:element name="zip"    type="xsd:integer"/>  
  </xsd:sequence>  
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>  
 </xsd:complexType>  
</xsd:schema>  
run;

libname myxsd xmlv2;

proc copy in=myxsd out=work;
run;

data weak_parse;
  infile myxsd dsd dlm=" />" missover;
  length name type $100;
  input @"name=" name @"type=" type;
run;

A log ERROR will occur when proc copy attempts to read the xsd through libname.当 proc copy 尝试通过 libname 读取 xsd 时,将发生日志错误。 But the input statement sails along fine但是输入语句运行良好

536  libname myxsd xmlv2;
NOTE: Libref MYXSD was successfully assigned as follows:
      Engine:        XMLV2
      Physical Name: C:\Users\Richard\AppData\Local\Temp\SAS Temporary
      Files\_TD2764_HELIUM_\#LN00053
537
538  proc copy in=myxsd out=work;
539  run;

ERROR: XML data is not in a format supported natively by the XML libname engine. Files of this
       type may require an XMLMap to be input properly.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE COPY used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.
540


541  data weak_parse;
542    infile myxsd dsd dlm=" />" missover;
543    length name type $100;
544    input @"name=" name @"type=" type;
545  run;

NOTE: The infile MYXSD is:
      Filename=C:\Users\Richard\AppData\Local\Temp\SAS Temporary Files\_TD2764_HELIUM_\#LN00053,
      RECFM=V,LRECL=32767,File Size (bytes)=1968,
      Last Modified=21Sep2018:22:51:56,
      Create Time=21Sep2018:22:51:56

NOTE: 24 records were read from the infile MYXSD.
      The minimum record length was 80.
      The maximum record length was 80.
NOTE: The data set WORK.WEAK_PARSE has 24 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

The data read in will be读入的数据将是

The SAS System

Obs    name                 type

  1
  2
  3
  4
  5    PurchaseOrder        tns:PurchaseOrderType
  6    PurchaseOrderType
  7
  8    ShipTo               tns:USAddress
  9    BillTo               tns:USAddress
 10
 11    OrderDate            xsd:date
 12
 13
 14    USAddress
 15
 16    name                 xsd:string
 17    street               xsd:string
 18    city                 xsd:string
 19    state                xsd:string
 20    zip                  xsd:integer
 21
 22    country              xsd:NMTOKEN
 23
 24

Had the same issue, kindly try this (it worked for me):有同样的问题,请试试这个(它对我有用):

data want;
infile "M:\some\path\ihave\favorites.xml";
length line $100;
input;
line = _infile_;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM