[英]How to `data.frame` with different number of rows but related (not `by`)
Here is the sample of the XML format in my dataset. 以下是我的数据集中XML格式的示例。
<info>
<a>1990-01-02T06:58:12+08:00</a>
<b>120.980</b>
<c>23.786</c>
<d>18.7</d>
<e>2</e>
</info>
<info>
<a>1990-02-02T06:58:12+08:00</a>
<b>120.804</b>
<c>23.790</c>
</info>
But the numbers of tag is not same as tag , for example there are 4000 rows tag a, b, c, and only 3950 rows for tag d, e 但是标签的数量与标签不同,例如标签a,b,c有4000行,标签d,e只有3950行
Here is my code in R 这是我在R中的代码
library(xml2)
data.frame(Time = xml_text(xml_find_all(xml_data, ".//a")),
Num = xml_text(xml_find_all(xml_data, ".//b")),
Dist = xml_text(xml_find_all(xml_data, ".//c")),
Gap = xml_text(xml_find_all(xml_data, ".//d")),
Type = xml_text(xml_find_all(xml_data, ".//e")),
stringsAsFactors = F)
}) -> df
The error message is: (I knew this will happened) 错误消息是:(我知道会发生这种情况)
arguments imply differing number of rows
参数意味着不同的行数
The output I want will be like the table below: 我想要的输出将如下表所示:
Time Num Dist Gap Type
1990-01-02T06:58:12+08:00 120.980 23.786 18.7 2
1990-02-02T06:58:12+08:00 120.804 23.790 <NA> <NA>
...
1993-03-03T08:42:15+08:00 120.412 23.523 <NA> 1
Which function or library should I try for this? 我应该尝试哪种功能或库?
Thanks for helping me !! 谢谢你的帮助!!
I have tried some another method like map_if
我尝试了另一种方法,如
map_if
Finally I found the solution!! 最后我找到了解决方案!!
Once we are using the xml file, be sure to get the root node of the records at first. 一旦我们使用xml文件,请务必首先获取记录的根节点。
Here I will show you how it works. 在这里,我将向您展示它是如何工作的。
Take this xml file for example: (name it to test.xml) 以此xml文件为例:(将其命名为test.xml)
<dataset>
<dataset_info>
<data_count>2</data_count>
<status>Actual</status>
</dataset_info>
<data>
<time>2019-06-01</time>
<event>event1</event>
<describe>describe for event1</describe>
</data>
<data>
<time>2019-06-02</time>
<event>event2</event>
</data>
</dataset>
We know that there is a tag describe
missing in event2, but we hope to make data frame by this xml data. 我们知道event2
describe
缺少一个标签describe
,但我们希望通过这个xml数据创建数据框。 I was taught to use the function xml2::xml_find_all
to get the value in the selected tag. 我被教导使用函数
xml2::xml_find_all
来获取所选标记中的值。 By the R code like this: 通过这样的R代码:
# library import
library(xml) #require(xml2)
# file reading
xml <- read_xml("path/where/the/file/is/test.xml")
data.frame(Time = xml_text(xml_find_all(xml, ".//time"))
Event = xml_text(xml_find_all(xml, ".//event"))
Describe = xml_text(xml_find_all(xml, ".//describe"))
)
Then we will get error message arguments imply differing number of rows
然后我们将获得错误消息
arguments imply differing number of rows
So what we need to do is get the root of records first!! 所以我们需要做的就是先获取记录的根源! As the code below:
如下面的代码:
# library import
library(xml) #require(xml2)
# file reading
xml <- read_xml("path/where/the/file/is/test.xml")
record <- xml_find_all(xml, ".//data")
data.frame(Time = xml_text(xml_find_all(record, ".//time"))
Event = xml_text(xml_find_all(record, ".//event"))
Describe = xml_text(xml_find_all(record, ".//describe"))
)
After adding record <- xml_find_all(xml, ".//data")
, we will no longer get the error cause by different counting of the results. 添加
record <- xml_find_all(xml, ".//data")
,我们将不再通过不同的结果计数得到错误原因。
Hope this can help !! 希望这可以帮助!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.