简体   繁体   中英

xml document into data.frame in R

I got a problem. I have an xml document and I need it into a data.frame in R. so far I managed it to upload a simple xml into a data.frame using the packages xml and plyr and doing

dataframe=ldply(xmlToList("file.xml"), data.frame)

but when I run this xml:

    <BusinessUnitList>
    <BusinessUnit id="000000195">
      <User id="897654322" firstName="Rick" lastName="Test" middleName="R" defaultLanguageName="English">
        <RoleList>
          <Role id="worker"/>
        </RoleList>
        <OrgList>
          <Organization id="1111"/>
        </OrgList>
        <Address country="Italy"/>
        <Employee badgeNumber="575757" Date="2017-01-01" DateNew="2017-01-02" birthDate="1999-01-01">
          <Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
        </Employee>
      </User>
</BusinessUnit>
    <BusinessUnit id="000000111">
      <User id="897652222" firstName="TERI" lastName="tst2" middleName="D" defaultLanguageName="English">
        <RoleList>
          <Role id="worker"/>
        </RoleList>
        <OrgList>
          <Organization id="2222"/>
        </OrgList>
        <Address country="Portugal"/>
        <Employee badgeNumber="575757" Date="2017-02-02" DateNew="2017-02-02" birthDate="1998-01-01">
          <Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
        </Employee>
      </User>
      </BusinessUnit>
    </BusinessUnitList>

i receive an error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 9, 7.

You are trying to combine a list like this

list(a=1:2, b=3:5)
$a
[1] 1 2

$b
[1] 3 4 5

data.frame( list(a=1:2, b=3:5))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 3

I would maybe unlist the xmlToList results and format the column names.

doc <- xmlParse("file.xml") 
x <- data.frame( t( unlist(xmlToList(doc))) )
names(x) <- gsub("(..attrs)?.id$", "_id", names(x))
names(x) <-  gsub(".*\\.", "", names(x))

  Role_id Organization_id country val1 val2 val3 HoursPerWeek HoursBetweenShifts minHoursPerWeek badgeNumber       Date    DateNew  birthDate   User_id firstName lastName middleName defaultLanguageName BusinessUnit_id
1  worker            1111   Italy    5    n    6        33.75                 10           00.00      575757 2017-01-01 2017-01-02 1999-01-01 897654322      Rick     Test          R             English       000000195

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM