简体   繁体   中英

Extracting specific rows from a nested data frame using map with an if else condition

I have a data frame with some nested data and I would like to flatten the data and extract specific cells from the nested data.

The nested data is in the column MetaData which looks like the following:

[[1]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                     Nombre Codigo
1     72           3           Tipo de dato                                  Dato base       
2   5457          19             Municipios             MUN                     Abrera  08001
3 274520         260 Conceptos Demográficos                 Edad media de la población       

[[2]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                              Nombre  Codigo
1     72           3           Tipo de dato                                           Dato base        
2 366833         260 Conceptos Demográficos                 Porcentaje de hogares unipersonales        
3 327739         846              Distritos            DIST                Badalona distrito 02 0801502

[[3]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                                 Nombre     Codigo
1     72           3           Tipo de dato                                              Dato base           
2 366833         260 Conceptos Demográficos                    Porcentaje de hogares unipersonales           
3 331103         847              Secciones            SECC Santa Coloma de Gramenet sección 05009 0824505009

I want to extract -

  • From [[1]] the municipios , MUN , Abrera and 08001

  • From [[2]] the Distritos , DIST , Badalona distrito 02 and 0801502

  • From [[3]] the Secciones , SECC , Santa Columa de Gramenet sección 05009 and 0824505009 .

However, just using the cell location does not work since the location for the MUN data is in a slightly different location for the DIST and SECC data - ie for MUN under the column Nombre and Codigo the cell location is in row 2, whereas for the DIST and SECC data the cell location is in row 3.

I have the following code which can extract the data for the MUN without problem.

data2 <- data %>% 
  mutate(MetaDataWider = map(MetaData, ~ {
    v1 <- .x[cbind(c(2, 3, 2, 2, 3), c(3, 3, 4, 5, 5))]
    names(v1) <- c("type", "contable", "type_code", "region", "variable")
    as_tibble_row(v1)
  })
  )

I now want to add an ifelse statement to the map / map_if in order to correctly extract the data for the DIST and SECC observations.

Alternatively, create the ifelse statment to change the order of the DIST and SECC observations. ie

if Variable.Codigo contains DIST | SECC shift row 3 to row 2, else nothing.

Then I can use the code I already have to extract the data.

Data:

data <- structure(list(COD = c("ADRH7218704", "ADRH7013747", "ADRH6909920"
), Nombre = c("Abrera. Edad media de la población. Dato base. ", 
"Badalona distrito 02. Porcentaje de hogares unipersonales. Dato base. ", 
"Santa Coloma de Gramenet sección 05009. Porcentaje de hogares unipersonales. Dato base. "
), T3_Unidad = c("Años", "Porcentaje", "Porcentaje"), T3_Escala = c(" ", 
" ", " "), MetaData = list(structure(list(Id = c(72L, 5457L, 
274520L), Variable = structure(list(Id = c(3L, 19L, 260L), Nombre = c("Tipo de dato", 
"Municipios", "Conceptos Demográficos"), Codigo = c("", "MUN", 
"")), class = "data.frame", row.names = c(NA, 3L)), Nombre = c("Dato base", 
"Abrera", "Edad media de la población"), Codigo = c("", "08001", 
"")), class = "data.frame", row.names = c(NA, 3L)), structure(list(
    Id = c(72L, 366833L, 327739L), Variable = structure(list(
        Id = c(3L, 260L, 846L), Nombre = c("Tipo de dato", "Conceptos Demográficos", 
        "Distritos"), Codigo = c("", "", "DIST")), class = "data.frame", row.names = c(NA, 
    3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales", 
    "Badalona distrito 02"), Codigo = c("", "", "0801502")), class = "data.frame", row.names = c(NA, 
3L)), structure(list(Id = c(72L, 366833L, 331103L), Variable = structure(list(
    Id = c(3L, 260L, 847L), Nombre = c("Tipo de dato", "Conceptos Demográficos", 
    "Secciones"), Codigo = c("", "", "SECC")), class = "data.frame", row.names = c(NA, 
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales", 
"Santa Coloma de Gramenet sección 05009"), Codigo = c("", "", 
"0824505009")), class = "data.frame", row.names = c(NA, 3L))), 
    Data = list(structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00", 
    "2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00", 
    "2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo", 
    "Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A", 
    "A", "A", "A"), Anyo = 2018:2015, Valor = c(39.7, 39.5, 39.2, 
    38.8)), class = "data.frame", row.names = c(NA, 4L)), structure(list(
        Fecha = c("2018-01-01T00:00:00.000+01:00", "2017-01-01T00:00:00.000+01:00", 
        "2016-01-01T00:00:00.000+01:00", "2015-01-01T00:00:00.000+01:00"
        ), T3_TipoDato = c("Definitivo", "Definitivo", "Definitivo", 
        "Definitivo"), T3_Periodo = c("A", "A", "A", "A"), Anyo = 2018:2015, 
        Valor = c(25.5, 25.7, 25.5, 25.8)), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00", 
    "2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00", 
    "2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo", 
    "Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A", 
    "A", "A", "A"), Anyo = 2018:2015, Valor = c(24.1, 23.6, 22.2, 
    20.9)), class = "data.frame", row.names = c(NA, 4L)))), row.names = c(NA, 
-3L), class = "data.frame")

How about using purrr::map_df ?

purrr::map_df(data$MetaData, ~ {
  .x[.x$Variable$Codigo != '', ]
})

We can use rbindlist

library(data.table)
rbindlist(lapply(data$MetaData, function(x) {
                     do.call(data.frame, subset(x, Variable$Codigo != ""))
       }))

-output

  Id Variable.Id Variable.Nombre Variable.Codigo                                 Nombre     Codigo
1:   5457          19      Municipios             MUN                                 Abrera      08001
2: 327739         846       Distritos            DIST                   Badalona distrito 02    0801502
3: 331103         847       Secciones            SECC Santa Coloma de Gramenet sección 05009 0824505009

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM