I have two tibbles
data
A tibble: 6,358,584 x 3
Date Name Key
<date> <chr> <chr>
treated_group
A tibble: 6,051 x 1
Key
<chr>
The key identifies my treated group and I would like to subset the bigger tibble for all treated objects. However by using filter
data %>% filter(Key == treated_group)
I run into the error:
Error in filter_impl(.data, quo) : Result must have length 6358584, not 6051
I recognize that I can use filter only for a 1x1 and thus I would have used a workaround where I loop through the rows of treated_group and filter for the data for every row, but this is very inefficient and I would like to stay within the dplyr framework.
Any hint and help is appreciated!
head(data)
#> # A tibble: 6 x 3
#> TIMESTAMP_UTC ENTITY_NAME ENS_KEY
#> <date> <chr> <chr>
#> 1 2000-01-04 3M Co. E73F64B685D3E70AFE8DFC37C33825F7
#> 2 2000-01-04 3M Co. 62D1EE4BF4DF6EDD38F95E4033B4E687
#> 3 2000-01-05 3M Co. 24EFCCD1828DDBB164A7CDED15696EC9
#> 4 2000-01-05 3M Co. 62D1EE4BF4DF6EDD38F95E4033B4E687
#> 5 2000-01-10 3M Co. BF24EB30E19607DD73C0BC51F9EF2DF4
#> 6 2000-01-10 3M Co. 940F168DB3203A028350BC4989EBDE17
head(treated_data)
#> # A tibble: 6 x 1
#> ENS_KEY
#> <chr>
#> 1 2CDDC73CD6247E41244EE82B3BD2AB14
#> 2 940F168DB3203A028350BC4989EBDE17
#> 3 1D9944BA5D170684910D3F5E56C2990B
#> 4 8431C047CFA3920042325B28B238E335
#> 5 606FAF396319C78ABC9CAD17C49E52D9
#> 6 3B277F9151290346EF7E05EC046121D9
filter(data,ENS_KEY %in% treated_data)
#> # A tibble: 0 x 3
#> # ... with 3 variables: TIMESTAMP_UTC <date>, ENTITY_NAME <chr>,
#> # ENS_KEY <chr>
Created on 2019-07-31 by the reprex package (v0.3.0)
As you can see entry 6 of my data and entry 2 of my treated_data match, but the output is an empty tibble!
How about something like this?
The pull
function just takes the values in the column and puts them in to a vector. You can use this with %in%
when you filter.
td <- treated_data %>%
pull #just gets the values
data %>%
filter(ENS_KEY %in% td)
and you get:
# A tibble: 1 x 3
TIMESTAMP_UTC ENTITY_NAME ENS_KEY
<chr> <chr> <chr>
1 10/01/2000 3M Co. 940F168DB3203A028350BC4989EBDE17
Another option, which will give you the same result:
data %>%
inner_join(treated_data, by = "ENS_KEY")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.