[英]How to read a base64 numerical vector from an XML file?
I am trying to read the Ydata from an XML file. 我正在尝试从XML文件读取Ydata。 The Base64 string containing the Ydata is a 1245-elements numerical vector and is stored in data$ATR. 包含Ydata的Base64字符串是一个1245个元素的数值向量,并存储在data $ ATR中。 The file has been encoded according to the gaml.org standards. 该文件已根据gaml.org标准进行了编码。
Tried too many things the whole day, but none is working... Will not post here all the countless things I tried and that does not work. 一整天尝试了太多东西,但没有任何效果……不会在这里发布我尝试过的所有无数东西,但是那没有用。 I am out of ideas. 我没主意。 How to convert it to R numeric vector? 如何将其转换为R数值向量?
library(XML)
x = XML::xmlTreeParse("http://utsav.podzone.net/T0011VAP1.0.xml")
xmltop = xmlRoot(x)
data = xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
(data$ATR)
From an independent source, I know that the first 10 elemens of Ydata are : 从一个独立的消息来源,我知道Ydata的前10个元素是:
-0.0903
-0.0751
-0.0605
-0.0471
-0.0353
-0.0249
-0.0159
-0.0082
-0.0017
0.0035
I dont have access to that URL (company filtering), but this example may help: 我无权访问该URL(公司过滤),但此示例可能会有所帮助:
The actual sample XML is 实际的示例XML是
<root>
<value>123</value>
<value>234</value>
<value>345</value>
</root>
encoded as 编码为
PHJvb3Q+DQo8dmFsdWU+MTIzPC92YWx1ZT4NCjx2YWx1ZT4yMzQ8L3ZhbHVlPg0KPHZhbHVlPjM0NTwvdmFsdWU+DQo8L3Jvb3Q+
and the code to access the XML: 以及访问XML的代码:
library("RCurl")
library("XML")
tmp <- "PHJvb3Q+DQo8dmFsdWU+MTIzPC92YWx1ZT4NCjx2YWx1ZT4yMzQ8L3ZhbHVlPg0KPHZhbHVlPjM0NTwvdmFsdWU+DQo8L3Jvb3Q+";
xml <- base64(tmp, encode=FALSE)
x = XML::xmlTreeParse(xml)
xmltop = xmlRoot(x)
data = xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
That is one ugly file ;-) It looks to be output from these guys . 那是一个丑陋的文件;-)看起来是这些家伙的输出。
Using the numvalues
from this XML tag in the data file: 在数据文件中使用此XML标记的numvalues
:
<values byteorder="INTEL" format="FLOAT32" numvalues="1245">
if you do: 如果您这样做:
library(base64enc)
head(readBin(base64decode(data$ATR), "double", 1245, 4), 10)
I made the assumption that the 1245
for numvalues
as the number of floating point #'s in the binary field—which turned out to be a good assumption. 我的假设是, 1245
的numvalues
浮点#的在二进制领域-这竟然是一个好的假设的数量。
it gives: 它给:
## [1] -0.090307593 -0.075070500 -0.060486197 -0.047122478 -0.035274029 -0.024934530 -0.015949965 -0.008214951
## [9] -0.001725793 0.003505349
That output tracks nicely with your 10 known elements. 该输出可以很好地跟踪您的10个已知元素。
The readBin
call says to use double
as the mode (data type) for the vector it's returning and the 4
is the # of bytes per element. readBin
调用表示将double
用作返回的向量的模式(数据类型),而4
是每个元素的字节数。
You may have to add endian="little"
as a parameter to readBin
depending on your architecture (at least I think that might be the case…I found readBin
to be less than deterministic per-OS when using it to read color swatch files but I'm sure the CRAN guaRdians would berate me for questioning the behavior of it). 您可能需要添加endian="little"
作为readBin
的参数,具体取决于您的体系结构(至少我认为可能是这种情况……我发现在使用readBin
读取色样文件时,它小于确定性的每个OS,但是我敢肯定,CRAN担保人会因为质疑它的行为而责备我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.