繁体   English   中英

在r的嵌套变量中的某些列上应用函数

[英]apply function on certain columns in nested variable in r

我想对嵌套变量中的某些列应用矢量化操作。 我要应用的功能是找到数字特征(例如weightcalories的缺失值总和。 我拥有的数据帧如下

df <- data.frame(country = c("US", "US", "UK", "PAK"),name = c("David", 
"James", "Junaid", "Ali"), fruit = c("Apple", "banana", "orange", "melon"), 
weight = c(90,110,120,NA), calories = c(NA,20, NA,NA))

  country   name  fruit weight calories
1      US  David  Apple     90       NA
2      US  James banana    110       20
3      UK Junaid orange    120       NA
4     PAK    Ali  melon     NA       NA

当我嵌套数据框时

nested_df <- df %>% group_by(country) %>% nest()


# A tibble: 3 × 2
  country             data
   <fctr>           <list>
1      US <tibble [2 × 4]>
2      UK <tibble [1 × 4]>
3     PAK <tibble [1 × 4]>

我尝试使用以下语法,但无济于事。

nested_df %>% mutate(missings = map(data, c("weight", "calories")) %>% 
                             map_lgl(function(x) sum(!is.na(x))/length(x) ==1))`

我预期的结果如下

`# A tibble: 3 × 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 × 4]>       FALSE
2      UK <tibble [1 × 4]>       FALSE
3     PAK <tibble [1 × 4]>       TRUE` 

但是,我得到的是

` A tibble: 3 × 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 × 4]>       NA
2      UK <tibble [1 × 4]>       NA
3     PAK <tibble [1 × 4]>       NA`

这将检查是否超过50%的值是NA ...

colstocheck <- c("weight", "calories")
nested_df %>% mutate(missings = (map_lgl(data, 
                function(x) sum(is.na(x[,colstocheck]))/length(x[,colstocheck]) > 0.5)))

# A tibble: 3 x 3
  country             data missings
   <fctr>           <list>    <lgl>
1      US <tibble [2 x 4]>    FALSE
2      UK <tibble [1 x 4]>    FALSE
3     PAK <tibble [1 x 4]>     TRUE

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM