简体   繁体   中英

How to delete root nodes of XML strings in R

I want to combine multiple XML strings (> 1000) into one string in R. This can for example be done by the XML package (xml_add_sibling). However I would like to get rid of the intermediate root nodes ("positions" in my example).

Input:

library(XML)    
position1 <- <positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>

position2 <- <positions>
      <moneyMarket>
        <positionName>2</positionName>
        <notional>40000</notional>
        <currency>EUR</currency>
      </moneyMarket>
        </positions>

position3 <- <positions>
      <moneyMarket>
        <positionName>3</positionName>
        <notional>50000</notional>
        <currency>EUR</currency>
      </moneyMarket>
    </positions>

Code:

combined_XML <- xml_add_sibling(position1,position2)
combined_XML <- xml_add_sibling(combined_XML,position3)

Actual results:

<positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>
<positions>
  <moneyMarket>
    <positionName>2</positionName>
    <notional>40000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>
<positions>
  <moneyMarket>
    <positionName>3</positionName>
    <notional>50000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>

Expected results:

<positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
  <moneyMarket>
    <positionName>2</positionName>
    <notional>40000</notional>
    <currency>EUR</currency>
  </moneyMarket>
  <moneyMarket>
    <positionName>3</positionName>
    <notional>50000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>

I took the example data which is including three xml document with name position1 , position2 and position3. Since each one has a name called position , I used get function to reach them. I assigned i<-3 , since there exist three xml document .

If you have got 1000 xml file, then you need to assign i<-1000 . So it means that you have got 1000 xml file named with both position and number like ; position1, position2, position3, position4, ..., position1000.

The codes below, adds the children of the other xml documents to the first one which is position1 . Thus, at the end, by running xmlParse(position1) you can reach the result.

  library(xml2)  
  library(XML)

  position1 <- "<positions>
                  <moneyMarket>
                    <positionName>1</positionName>
                    <notional>10000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"

  position2 <- "<positions>
                  <moneyMarket>
                    <positionName>2</positionName>
                    <notional>40000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"

  position3 <- "<positions>
                  <moneyMarket>
                    <positionName>3</positionName>
                    <notional>50000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"


 position1 <- read_xml(position1)
 position2 <- read_xml(position2)
 position3 <- read_xml(position3)


 i <- 3

 while(i>1) {

     mychildren <- xml_children(get(paste0("position",i)))

     for (child in mychildren) {

        xml_add_child(get(paste0("position",i-1)), child)

     }

     i <- i-1

 } 

 xmlParse(position1)

Output:

  <?xml version="1.0" encoding="UTF-8"?>
  <positions>
     <moneyMarket>
       <positionName>1</positionName>
       <notional>10000</notional>
       <currency>EUR</currency>
     </moneyMarket>
     <moneyMarket>
       <positionName>2</positionName>
       <notional>40000</notional>
       <currency>EUR</currency>
     </moneyMarket>
     <moneyMarket>
       <positionName>3</positionName>
       <notional>50000</notional>
       <currency>EUR</currency>
     </moneyMarket>
 </positions>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM