[英]Extracting specific rows from a nested data frame using map with an if else condition
I have a data frame with some nested data and I would like to flatten the data and extract specific cells from the nested data.我有一个包含一些嵌套数据的数据框,我想展平数据并从嵌套数据中提取特定单元格。
The nested data is in the column MetaData
which looks like the following:嵌套数据位于
MetaData
列中,如下所示:
[[1]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 5457 19 Municipios MUN Abrera 08001
3 274520 260 Conceptos Demográficos Edad media de la población
[[2]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 366833 260 Conceptos Demográficos Porcentaje de hogares unipersonales
3 327739 846 Distritos DIST Badalona distrito 02 0801502
[[3]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 366833 260 Conceptos Demográficos Porcentaje de hogares unipersonales
3 331103 847 Secciones SECC Santa Coloma de Gramenet sección 05009 0824505009
I want to extract -我想提取 -
From [[1]]
the municipios
, MUN
, Abrera
and 08001
来自
[[1]]
municipios
、 MUN
、 Abrera
和08001
From [[2]]
the Distritos
, DIST
, Badalona distrito 02
and 0801502
来自
[[2]]
Distritos
, DIST
, Badalona distrito 02
和0801502
From [[3]]
the Secciones
, SECC
, Santa Columa de Gramenet sección 05009
and 0824505009
.来自
[[3]]
Secciones
、 SECC
、 Santa Columa de Gramenet sección 05009
和0824505009
。
However, just using the cell location does not work since the location for the MUN
data is in a slightly different location for the DIST
and SECC
data - ie for MUN
under the column Nombre
and Codigo
the cell location is in row 2, whereas for the DIST
and SECC
data the cell location is in row 3.但是,仅使用单元格位置是行不通的,因为
MUN
数据的位置与DIST
和SECC
数据的位置略有不同 - 即对于列Nombre
和Codigo
下的MUN
,单元格位置在第 2 行,而对于DIST
和SECC
数据单元格位置在第 3 行。
I have the following code which can extract the data for the MUN
without problem.我有以下代码可以毫无问题地提取
MUN
的数据。
data2 <- data %>%
mutate(MetaDataWider = map(MetaData, ~ {
v1 <- .x[cbind(c(2, 3, 2, 2, 3), c(3, 3, 4, 5, 5))]
names(v1) <- c("type", "contable", "type_code", "region", "variable")
as_tibble_row(v1)
})
)
I now want to add an ifelse
statement to the map
/ map_if
in order to correctly extract the data for the DIST
and SECC
observations.我现在想向
map
/ map_if
添加一个ifelse
语句,以便正确提取DIST
和SECC
观测的数据。
Alternatively, create the ifelse
statment to change the order of the DIST
and SECC
observations.或者,创建
ifelse
语句以更改DIST
和SECC
观察的顺序。 ie IE
if
Variable.Codigo
containsDIST
|如果
Variable.Codigo
包含DIST
|SECC
shift row 3 to row 2, else nothing.SECC
将第 3 行移到第 2 行,否则什么都没有。
Then I can use the code I already have to extract the data.然后我可以使用我已经拥有的代码来提取数据。
Data:数据:
data <- structure(list(COD = c("ADRH7218704", "ADRH7013747", "ADRH6909920"
), Nombre = c("Abrera. Edad media de la población. Dato base. ",
"Badalona distrito 02. Porcentaje de hogares unipersonales. Dato base. ",
"Santa Coloma de Gramenet sección 05009. Porcentaje de hogares unipersonales. Dato base. "
), T3_Unidad = c("Años", "Porcentaje", "Porcentaje"), T3_Escala = c(" ",
" ", " "), MetaData = list(structure(list(Id = c(72L, 5457L,
274520L), Variable = structure(list(Id = c(3L, 19L, 260L), Nombre = c("Tipo de dato",
"Municipios", "Conceptos Demográficos"), Codigo = c("", "MUN",
"")), class = "data.frame", row.names = c(NA, 3L)), Nombre = c("Dato base",
"Abrera", "Edad media de la población"), Codigo = c("", "08001",
"")), class = "data.frame", row.names = c(NA, 3L)), structure(list(
Id = c(72L, 366833L, 327739L), Variable = structure(list(
Id = c(3L, 260L, 846L), Nombre = c("Tipo de dato", "Conceptos Demográficos",
"Distritos"), Codigo = c("", "", "DIST")), class = "data.frame", row.names = c(NA,
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales",
"Badalona distrito 02"), Codigo = c("", "", "0801502")), class = "data.frame", row.names = c(NA,
3L)), structure(list(Id = c(72L, 366833L, 331103L), Variable = structure(list(
Id = c(3L, 260L, 847L), Nombre = c("Tipo de dato", "Conceptos Demográficos",
"Secciones"), Codigo = c("", "", "SECC")), class = "data.frame", row.names = c(NA,
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales",
"Santa Coloma de Gramenet sección 05009"), Codigo = c("", "",
"0824505009")), class = "data.frame", row.names = c(NA, 3L))),
Data = list(structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00",
"2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00",
"2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo",
"Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A",
"A", "A", "A"), Anyo = 2018:2015, Valor = c(39.7, 39.5, 39.2,
38.8)), class = "data.frame", row.names = c(NA, 4L)), structure(list(
Fecha = c("2018-01-01T00:00:00.000+01:00", "2017-01-01T00:00:00.000+01:00",
"2016-01-01T00:00:00.000+01:00", "2015-01-01T00:00:00.000+01:00"
), T3_TipoDato = c("Definitivo", "Definitivo", "Definitivo",
"Definitivo"), T3_Periodo = c("A", "A", "A", "A"), Anyo = 2018:2015,
Valor = c(25.5, 25.7, 25.5, 25.8)), class = "data.frame", row.names = c(NA,
4L)), structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00",
"2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00",
"2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo",
"Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A",
"A", "A", "A"), Anyo = 2018:2015, Valor = c(24.1, 23.6, 22.2,
20.9)), class = "data.frame", row.names = c(NA, 4L)))), row.names = c(NA,
-3L), class = "data.frame")
How about using purrr::map_df
?使用
purrr::map_df
怎么purrr::map_df
?
purrr::map_df(data$MetaData, ~ {
.x[.x$Variable$Codigo != '', ]
})
We can use rbindlist
我们可以使用
rbindlist
library(data.table)
rbindlist(lapply(data$MetaData, function(x) {
do.call(data.frame, subset(x, Variable$Codigo != ""))
}))
-output -输出
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1: 5457 19 Municipios MUN Abrera 08001
2: 327739 846 Distritos DIST Badalona distrito 02 0801502
3: 331103 847 Secciones SECC Santa Coloma de Gramenet sección 05009 0824505009
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.