I am having difficulties importing XML files with a specific structure to a dataframe in R.
An example of the XML file can be found here: XML to import
The end result should be a data frame which is nicely structured until the deepest nest level (Adms) of the XML file:
This would mean that data is repeated in other columns but that is not a problem.
I've tried multiple solutions found on StackOverflow for XML import but due to the structure of the XML file I have I cannot get it too work.
For the moment I am thus required to go through Excel and transform the XML to CSV with the "GetData" option, but as I have 100's of these XMLs to process I would like to automate this task.
Thank you in advance for your help!
This will give you one patient per row:
library(xml2)
library(XML)
library(tidyverse)
xml <-
read_xml("~/Downloads/00000123456_0000071234567123_20150922101212_TH.XML") %>%
xmlParse()
xml %>%
xmlToDataFrame(nodes = getNodeSet(xml, "//Patient")) %>%
as_tibble()
#> # A tibble: 12 x 15
#> Id Name Firstname HomeID Location1 Location2 Location3 Location4
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 360923… Van den … Freya 002 VILLA TRAN… 1e verdie… Sectie 1 104
#> 2 211209… Verhofst… Guy 004 VILLA TRAN… 1e verdie… Sectie 1 102
#> 3 410630… Hanckeli… Laurette 005 VILLA TRAN… 1e verdie… Sectie 1 102
#> 4 251019… Smaak Antoinet… 007 VILLA TRAN… 1e verdie… Sectie 1 101
#> 5 281213… Areno Marie 008 VILLA TRAN… 1e verdie… Sectie 1 103
#> 6 190219… De Waen Patrick 010 VILLA TRAN… 2e verdie… Sectie 2 203
#> 7 271023… Vande Ma… Johan 012 VILLA TRAN… 2e verdie… Sectie 2 201
#> 8 311215… Dirupa Elise 014 VILLA TRAN… 2e verdie… Sectie 2 202
#> 9 320704… Zomers Bertha 015 VILLA TRAN… 2e verdie… Sectie 1 202
#> 10 100112… Daerdenne Micheline 019 VILLA TRAN… 2e verdie… Sectie 2 204
#> 11 461217… Schoeppe Antoine 021 VILLA TRAN… 1e verdie… Sectie 1 101
#> 12 201114… Vanrompu… Germain 022 VILLA TRAN… 2e verdie… Sectie 2 206
#> # … with 7 more variables: Location5 <chr>, Birthdate <chr>, DoctorName <chr>,
#> # DoctorMedRegNr <chr>, PatientUnidose <chr>, Shortstay <chr>, Products <chr>
Created on 2022-02-16 by the reprex package (v2.0.0)
Since Products can not fit into column names of the Patients table, one can keep them in one column:
read_xml("~/Downloads/00000123456_0000071234567123_20150922101212_TH.XML") %>%
xml_find_all("//Patient") %>%
as_list() %>%
map(~ {
.x %>%
enframe() %>%
filter(name != "Products") %>%
unnest_auto(value) %>%
pivot_wider() %>%
mutate(Products = list(.x$Products))
}) %>%
bind_rows()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.