简体   繁体   English

使用带有 if else 条件的 map 从嵌套数据框中提取特定行

[英]Extracting specific rows from a nested data frame using map with an if else condition

I have a data frame with some nested data and I would like to flatten the data and extract specific cells from the nested data.我有一个包含一些嵌套数据的数据框,我想展平数据并从嵌套数据中提取特定单元格。

The nested data is in the column MetaData which looks like the following:嵌套数据位于MetaData列中,如下所示:

[[1]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                     Nombre Codigo
1     72           3           Tipo de dato                                  Dato base       
2   5457          19             Municipios             MUN                     Abrera  08001
3 274520         260 Conceptos Demográficos                 Edad media de la población       

[[2]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                              Nombre  Codigo
1     72           3           Tipo de dato                                           Dato base        
2 366833         260 Conceptos Demográficos                 Porcentaje de hogares unipersonales        
3 327739         846              Distritos            DIST                Badalona distrito 02 0801502

[[3]]
      Id Variable.Id        Variable.Nombre Variable.Codigo                                 Nombre     Codigo
1     72           3           Tipo de dato                                              Dato base           
2 366833         260 Conceptos Demográficos                    Porcentaje de hogares unipersonales           
3 331103         847              Secciones            SECC Santa Coloma de Gramenet sección 05009 0824505009

I want to extract -我想提取 -

  • From [[1]] the municipios , MUN , Abrera and 08001来自[[1]] municipiosMUNAbrera08001

  • From [[2]] the Distritos , DIST , Badalona distrito 02 and 0801502来自[[2]] Distritos , DIST , Badalona distrito 020801502

  • From [[3]] the Secciones , SECC , Santa Columa de Gramenet sección 05009 and 0824505009 .来自[[3]] SeccionesSECCSanta Columa de Gramenet sección 050090824505009

However, just using the cell location does not work since the location for the MUN data is in a slightly different location for the DIST and SECC data - ie for MUN under the column Nombre and Codigo the cell location is in row 2, whereas for the DIST and SECC data the cell location is in row 3.但是,仅使用单元格位置是行不通的,因为MUN数据的位置与DISTSECC数据的位置略有不同 - 即对于列NombreCodigo下的MUN ,单元格位置在第 2 行,而对于DISTSECC数据单元格位置在第 3 行。

I have the following code which can extract the data for the MUN without problem.我有以下代码可以毫无问题地提取MUN的数据。

data2 <- data %>% 
  mutate(MetaDataWider = map(MetaData, ~ {
    v1 <- .x[cbind(c(2, 3, 2, 2, 3), c(3, 3, 4, 5, 5))]
    names(v1) <- c("type", "contable", "type_code", "region", "variable")
    as_tibble_row(v1)
  })
  )

I now want to add an ifelse statement to the map / map_if in order to correctly extract the data for the DIST and SECC observations.我现在想向map / map_if添加一个ifelse语句,以便正确提取DISTSECC观测的数据。

Alternatively, create the ifelse statment to change the order of the DIST and SECC observations.或者,创建ifelse语句以更改DISTSECC观察的顺序。 ie IE

if Variable.Codigo contains DIST |如果Variable.Codigo包含DIST | SECC shift row 3 to row 2, else nothing. SECC将第 3 行移到第 2 行,否则什么都没有。

Then I can use the code I already have to extract the data.然后我可以使用我已经拥有的代码来提取数据。

Data:数据:

data <- structure(list(COD = c("ADRH7218704", "ADRH7013747", "ADRH6909920"
), Nombre = c("Abrera. Edad media de la población. Dato base. ", 
"Badalona distrito 02. Porcentaje de hogares unipersonales. Dato base. ", 
"Santa Coloma de Gramenet sección 05009. Porcentaje de hogares unipersonales. Dato base. "
), T3_Unidad = c("Años", "Porcentaje", "Porcentaje"), T3_Escala = c(" ", 
" ", " "), MetaData = list(structure(list(Id = c(72L, 5457L, 
274520L), Variable = structure(list(Id = c(3L, 19L, 260L), Nombre = c("Tipo de dato", 
"Municipios", "Conceptos Demográficos"), Codigo = c("", "MUN", 
"")), class = "data.frame", row.names = c(NA, 3L)), Nombre = c("Dato base", 
"Abrera", "Edad media de la población"), Codigo = c("", "08001", 
"")), class = "data.frame", row.names = c(NA, 3L)), structure(list(
    Id = c(72L, 366833L, 327739L), Variable = structure(list(
        Id = c(3L, 260L, 846L), Nombre = c("Tipo de dato", "Conceptos Demográficos", 
        "Distritos"), Codigo = c("", "", "DIST")), class = "data.frame", row.names = c(NA, 
    3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales", 
    "Badalona distrito 02"), Codigo = c("", "", "0801502")), class = "data.frame", row.names = c(NA, 
3L)), structure(list(Id = c(72L, 366833L, 331103L), Variable = structure(list(
    Id = c(3L, 260L, 847L), Nombre = c("Tipo de dato", "Conceptos Demográficos", 
    "Secciones"), Codigo = c("", "", "SECC")), class = "data.frame", row.names = c(NA, 
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales", 
"Santa Coloma de Gramenet sección 05009"), Codigo = c("", "", 
"0824505009")), class = "data.frame", row.names = c(NA, 3L))), 
    Data = list(structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00", 
    "2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00", 
    "2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo", 
    "Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A", 
    "A", "A", "A"), Anyo = 2018:2015, Valor = c(39.7, 39.5, 39.2, 
    38.8)), class = "data.frame", row.names = c(NA, 4L)), structure(list(
        Fecha = c("2018-01-01T00:00:00.000+01:00", "2017-01-01T00:00:00.000+01:00", 
        "2016-01-01T00:00:00.000+01:00", "2015-01-01T00:00:00.000+01:00"
        ), T3_TipoDato = c("Definitivo", "Definitivo", "Definitivo", 
        "Definitivo"), T3_Periodo = c("A", "A", "A", "A"), Anyo = 2018:2015, 
        Valor = c(25.5, 25.7, 25.5, 25.8)), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00", 
    "2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00", 
    "2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo", 
    "Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A", 
    "A", "A", "A"), Anyo = 2018:2015, Valor = c(24.1, 23.6, 22.2, 
    20.9)), class = "data.frame", row.names = c(NA, 4L)))), row.names = c(NA, 
-3L), class = "data.frame")

How about using purrr::map_df ?使用purrr::map_df怎么purrr::map_df

purrr::map_df(data$MetaData, ~ {
  .x[.x$Variable$Codigo != '', ]
})

We can use rbindlist我们可以使用rbindlist

library(data.table)
rbindlist(lapply(data$MetaData, function(x) {
                     do.call(data.frame, subset(x, Variable$Codigo != ""))
       }))

-output -输出

  Id Variable.Id Variable.Nombre Variable.Codigo                                 Nombre     Codigo
1:   5457          19      Municipios             MUN                                 Abrera      08001
2: 327739         846       Distritos            DIST                   Badalona distrito 02    0801502
3: 331103         847       Secciones            SECC Santa Coloma de Gramenet sección 05009 0824505009

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM