繁体   English   中英

从 xml 数据的文本文件中提取字段

[英]extract fields from a text file of xml data

我有一个文本文件,我读入 SAS 并将其清理到每行包含如下内容的地步:

<xsd:element name="ReportingUnit" type="reportingunit:ReportingUnit_def" minOccurs="1" maxOccurs="1"/>

我需要提取 name 的值和 type 的值。 所以在这种情况下,我需要获取ReportingUnitReportingUnit_def

任何帮助将非常感激。 谢谢

xsdxml 不幸的是,不是 xml 格式设置为xmlv2引擎将支持它。 如果xsd如您所说是干净的,则使用@'character-string'指针控件的平面input将拉出您想要的数据。

示例代码

filename myxsd temp;

* example xsd from https://docs.microsoft.com/en-us/visualstudio/xml-tools/sample-xsd-file-simple-schema?view=vs-2015;
data _null_;
  file myxsd;
  input;
  put _infile_;
datalines;
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"   
           xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"   
           targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"   
           elementFormDefault="qualified">  
 <xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>  
 <xsd:complexType name="PurchaseOrderType">  
  <xsd:sequence>  
   <xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>  
   <xsd:element name="BillTo" type="tns:USAddress"/>  
  </xsd:sequence>  
  <xsd:attribute name="OrderDate" type="xsd:date"/>  
 </xsd:complexType>  

 <xsd:complexType name="USAddress">  
  <xsd:sequence>  
   <xsd:element name="name"   type="xsd:string"/>  
   <xsd:element name="street" type="xsd:string"/>  
   <xsd:element name="city"   type="xsd:string"/>  
   <xsd:element name="state"  type="xsd:string"/>  
   <xsd:element name="zip"    type="xsd:integer"/>  
  </xsd:sequence>  
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>  
 </xsd:complexType>  
</xsd:schema>  
run;

libname myxsd xmlv2;

proc copy in=myxsd out=work;
run;

data weak_parse;
  infile myxsd dsd dlm=" />" missover;
  length name type $100;
  input @"name=" name @"type=" type;
run;

当 proc copy 尝试通过 libname 读取 xsd 时,将发生日志错误。 但是输入语句运行良好

536  libname myxsd xmlv2;
NOTE: Libref MYXSD was successfully assigned as follows:
      Engine:        XMLV2
      Physical Name: C:\Users\Richard\AppData\Local\Temp\SAS Temporary
      Files\_TD2764_HELIUM_\#LN00053
537
538  proc copy in=myxsd out=work;
539  run;

ERROR: XML data is not in a format supported natively by the XML libname engine. Files of this
       type may require an XMLMap to be input properly.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE COPY used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.
540


541  data weak_parse;
542    infile myxsd dsd dlm=" />" missover;
543    length name type $100;
544    input @"name=" name @"type=" type;
545  run;

NOTE: The infile MYXSD is:
      Filename=C:\Users\Richard\AppData\Local\Temp\SAS Temporary Files\_TD2764_HELIUM_\#LN00053,
      RECFM=V,LRECL=32767,File Size (bytes)=1968,
      Last Modified=21Sep2018:22:51:56,
      Create Time=21Sep2018:22:51:56

NOTE: 24 records were read from the infile MYXSD.
      The minimum record length was 80.
      The maximum record length was 80.
NOTE: The data set WORK.WEAK_PARSE has 24 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

读入的数据将是

The SAS System

Obs    name                 type

  1
  2
  3
  4
  5    PurchaseOrder        tns:PurchaseOrderType
  6    PurchaseOrderType
  7
  8    ShipTo               tns:USAddress
  9    BillTo               tns:USAddress
 10
 11    OrderDate            xsd:date
 12
 13
 14    USAddress
 15
 16    name                 xsd:string
 17    street               xsd:string
 18    city                 xsd:string
 19    state                xsd:string
 20    zip                  xsd:integer
 21
 22    country              xsd:NMTOKEN
 23
 24

有同样的问题,请试试这个(它对我有用):

data want;
infile "M:\some\path\ihave\favorites.xml";
length line $100;
input;
line = _infile_;
run;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM