[英]extract fields from a text file of xml data
I have a text file I read in to SAS and I cleaned it to the point where each row contains something like this:我有一个文本文件,我读入 SAS 并将其清理到每行包含如下内容的地步:
<xsd:element name="ReportingUnit" type="reportingunit:ReportingUnit_def" minOccurs="1" maxOccurs="1"/>
I need to extract the value of name and the value of type.我需要提取 name 的值和 type 的值。 So in this case I need to grab ReportingUnit and ReportingUnit_def
所以在这种情况下,我需要获取ReportingUnit和ReportingUnit_def
Any help would be much appreciated.任何帮助将非常感激。 Thanks
谢谢
xsd
is xml
. xsd
是xml
。 Unfortunately, not xml that is formatted such that the xmlv2
engine will support it.不幸的是,不是 xml 格式设置为
xmlv2
引擎将支持它。 If the xsd
is clean as you state, flat input
using @'character-string'
pointer control will pull out the data you want.如果
xsd
如您所说是干净的,则使用@'character-string'
指针控件的平面input
将拉出您想要的数据。
Sample code示例代码
filename myxsd temp;
* example xsd from https://docs.microsoft.com/en-us/visualstudio/xml-tools/sample-xsd-file-simple-schema?view=vs-2015;
data _null_;
file myxsd;
input;
put _infile_;
datalines;
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://tempuri.org/PurchaseOrderSchema.xsd"
targetNamespace="http://tempuri.org/PurchaseOrderSchema.xsd"
elementFormDefault="qualified">
<xsd:element name="PurchaseOrder" type="tns:PurchaseOrderType"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="ShipTo" type="tns:USAddress" maxOccurs="2"/>
<xsd:element name="BillTo" type="tns:USAddress"/>
</xsd:sequence>
<xsd:attribute name="OrderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:integer"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>
</xsd:schema>
run;
libname myxsd xmlv2;
proc copy in=myxsd out=work;
run;
data weak_parse;
infile myxsd dsd dlm=" />" missover;
length name type $100;
input @"name=" name @"type=" type;
run;
A log ERROR will occur when proc copy attempts to read the xsd through libname.当 proc copy 尝试通过 libname 读取 xsd 时,将发生日志错误。 But the input statement sails along fine
但是输入语句运行良好
536 libname myxsd xmlv2;
NOTE: Libref MYXSD was successfully assigned as follows:
Engine: XMLV2
Physical Name: C:\Users\Richard\AppData\Local\Temp\SAS Temporary
Files\_TD2764_HELIUM_\#LN00053
537
538 proc copy in=myxsd out=work;
539 run;
ERROR: XML data is not in a format supported natively by the XML libname engine. Files of this
type may require an XMLMap to be input properly.
NOTE: Statements not processed because of errors noted above.
NOTE: PROCEDURE COPY used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
NOTE: The SAS System stopped processing this step because of errors.
540
541 data weak_parse;
542 infile myxsd dsd dlm=" />" missover;
543 length name type $100;
544 input @"name=" name @"type=" type;
545 run;
NOTE: The infile MYXSD is:
Filename=C:\Users\Richard\AppData\Local\Temp\SAS Temporary Files\_TD2764_HELIUM_\#LN00053,
RECFM=V,LRECL=32767,File Size (bytes)=1968,
Last Modified=21Sep2018:22:51:56,
Create Time=21Sep2018:22:51:56
NOTE: 24 records were read from the infile MYXSD.
The minimum record length was 80.
The maximum record length was 80.
NOTE: The data set WORK.WEAK_PARSE has 24 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
The data read in will be读入的数据将是
The SAS System
Obs name type
1
2
3
4
5 PurchaseOrder tns:PurchaseOrderType
6 PurchaseOrderType
7
8 ShipTo tns:USAddress
9 BillTo tns:USAddress
10
11 OrderDate xsd:date
12
13
14 USAddress
15
16 name xsd:string
17 street xsd:string
18 city xsd:string
19 state xsd:string
20 zip xsd:integer
21
22 country xsd:NMTOKEN
23
24
Had the same issue, kindly try this (it worked for me):有同样的问题,请试试这个(它对我有用):
data want;
infile "M:\some\path\ihave\favorites.xml";
length line $100;
input;
line = _infile_;
run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.