简体   繁体   English

xml文档放入R中的data.frame

[英]xml document into data.frame in R

I got a problem. 我有问题 I have an xml document and I need it into a data.frame in R. so far I managed it to upload a simple xml into a data.frame using the packages xml and plyr and doing 我有一个xml文档,我需要将它放到R中的data.frame中。到目前为止,我已经管理它使用xmlplyr软件包将一个简单的xml上载到plyr并且

dataframe=ldply(xmlToList("file.xml"), data.frame)

but when I run this xml: 但是当我运行这个xml:

    <BusinessUnitList>
    <BusinessUnit id="000000195">
      <User id="897654322" firstName="Rick" lastName="Test" middleName="R" defaultLanguageName="English">
        <RoleList>
          <Role id="worker"/>
        </RoleList>
        <OrgList>
          <Organization id="1111"/>
        </OrgList>
        <Address country="Italy"/>
        <Employee badgeNumber="575757" Date="2017-01-01" DateNew="2017-01-02" birthDate="1999-01-01">
          <Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
        </Employee>
      </User>
</BusinessUnit>
    <BusinessUnit id="000000111">
      <User id="897652222" firstName="TERI" lastName="tst2" middleName="D" defaultLanguageName="English">
        <RoleList>
          <Role id="worker"/>
        </RoleList>
        <OrgList>
          <Organization id="2222"/>
        </OrgList>
        <Address country="Portugal"/>
        <Employee badgeNumber="575757" Date="2017-02-02" DateNew="2017-02-02" birthDate="1998-01-01">
          <Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
        </Employee>
      </User>
      </BusinessUnit>
    </BusinessUnitList>

i receive an error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 9, 7. 我收到一个错误: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 9, 7.

You are trying to combine a list like this 您正在尝试合并这样的列表

list(a=1:2, b=3:5)
$a
[1] 1 2

$b
[1] 3 4 5

data.frame( list(a=1:2, b=3:5))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 3

I would maybe unlist the xmlToList results and format the column names. 我可能会取消列出xmlToList结果并设置列名称的格式。

doc <- xmlParse("file.xml") 
x <- data.frame( t( unlist(xmlToList(doc))) )
names(x) <- gsub("(..attrs)?.id$", "_id", names(x))
names(x) <-  gsub(".*\\.", "", names(x))

  Role_id Organization_id country val1 val2 val3 HoursPerWeek HoursBetweenShifts minHoursPerWeek badgeNumber       Date    DateNew  birthDate   User_id firstName lastName middleName defaultLanguageName BusinessUnit_id
1  worker            1111   Italy    5    n    6        33.75                 10           00.00      575757 2017-01-01 2017-01-02 1999-01-01 897654322      Rick     Test          R             English       000000195

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM