根据另一个 tibble 中的值子集 tibble 列

Question

I've searched the best that I could, but am still struggling with my problem.我已经尽我所能搜索了最好的，但仍在努力解决我的问题。 I am trying to subset columns in a tibble based on the values from another tibble.我正在尝试根据来自另一个 tibble 的值对 tibble 中的列进行子集化。

More specifically, I have a tibble of socio-economic indicators:更具体地说，我有一些社会经济指标：

cname   year  ccodealp  wdi_lfpr wdi_lfprf

Turkey  2010    TUR    51.611    29.592 
Turkey  2011    TUR    52.781    30.995 
Turkey  2012    TUR    52.809    31.676 
Turkey  2013    TUR    53.874    33.125 
Turkey  2014    TUR    54.597    33.446 
Turkey  2015    TUR    55.594    34.858

I have a separate tibble (Tibble 2) with two columns, the indicator and the % missingness of that indicator within Tibble 1我有一个单独的 tibble (Tibble 2)，它有两列，即 Tibble 1 中的指标和该指标的缺失百分比

tibble_2
col         value
who_dwtot   100         
who_dwrur   100         
who_dwurb   100

What I want to do is subset tibble_1 to only have columns that meet a certain criteria in tibble_2.我想要做的是子集 tibble_1 只有在 tibble_2 中具有满足特定条件的列。 Namely, only retain columns that have less than 90% missingness (the "value" column in tibble_2).即，仅保留缺失率低于 90% 的列（tibble_2 中的“值”列）。 I'm having trouble going about this in tidyverse.我在 tidyverse 中遇到了麻烦。 This the code I've tried:这是我尝试过的代码：

tibble_1 %>% select(tibble_2, "value" < 90)

Error: Must subset columns with a valid subscript vector. 
x Subscript has the wrong type `tbl_df< col : character value: double >`. i 
It must be numeric or character. Run `rlang::last_error()` to see where the error occurred.

I know this is probably a trivial problem, but I'm not an expert in tidyverse and can't figure out how to fix this.我知道这可能是一个微不足道的问题，但我不是 tidyverse 方面的专家，也不知道如何解决这个问题。

Thanks for any help.谢谢你的帮助。

Answer 1

We can filter the 'tibble_2' based on the 'value' column and pull the 'col' for select ing the column names in tibble_1我们可以根据 'value' 列filter 'tibble_2' 并为select pull tibble_1列名的 'col'

library(dplyr)
tibble_1 %>%
     select({tibble_2 %>%
                 filter(value < 90) %>%
                 pull(col)})

Or if we use base R或者如果我们使用base R

subset(tibble_1, select = subset(tibble_2, value < 90, select = col)$col)

根据另一个 tibble 中的值子集 tibble 列

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-04-27 16:54:44

根据另一个 tibble 中的值子集 tibble 列

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-04-27 16:54:44

解决方案1
4 已采纳 2021-04-27 16:54:44