简体   繁体   English

使用Dplyr过滤3个以上级别的因素时出现错误消息

[英]Error message when using Dplyr to filter with more than 3 levels to a factor

I'm trying to filter some factors in Dplyer, but instead of manually writing out the ones I wanted like c("Blue","Green","White") etc, I figured something like 我正在尝试过滤Dplyer中的一些因素,但是我没有手动写出我想要的因素,例如c(“ Blue”,“ Green”,“ White”)等,

levels(df$factor.variable)[1:3]

might prove faster, but if try to select more than 2 variables using the following code then I get the error message "longer object length is not a multiple of shorter object length" and a big chunk of the data doesn't come through. 可能会证明速度更快,但是如果尝试使用以下代码选择两个以上的变量,则会收到错误消息“较长的对象长度不是较短的对象长度的倍数” ,并且不会传递大量数据。 With my dummy data below, 2/3 of the data disappears. 在下面的虚拟数据中,2/3的数据消失了。

a <- 1:20
b <- rep(c("Blue", "Green", "White", "Grey"),5)
df <- data.frame(Numbers=a, colours=b)
df %>% 
  select(Numbers, colours) %>% 
  filter(colours==levels(df$colours)[1:3])

Note that if you only select 1 or 2 of the levels above (as in [1] or [1:2], not [1:3]), then the problem doesn't occur. 请注意,如果仅选择以上级别的1或2(如[1]或[1:2],而不是[1:3]),则不会出现此问题。 Also if I remove one of the colours (factors) then I don't have the problem anymore. 另外,如果我删除了一种颜色(因素),那么我就不再有问题了。

a <- 1:15
b <- rep(c("Blue", "Green", "White"),5)
df <- data.frame(Numbers=a, colours=b)
df %>% 
  select(Numbers, colours) %>% 
  filter(colours==levels(df$colours)[1:3])

What objects have longer/shorter lengths? 哪些对象的长度更长或更短? And why does 2/3 of the data disappear? 为何2/3的数据消失了?

You were making mistake in dplyr. 您在dplyr中犯了错误。 Instead of == use %in% solved the error. 代替==使用%in%解决了该错误。

a <- 1:20
b <- rep(c("Blue", "Green", "White", "Grey"),5)
df <- data.frame(Numbers=a, colours=b)
str(df)

df2<- df %>% 
  select(Numbers, colours) %>% 
  filter(colours %in% levels(df$colours)[1:3])

It's actually not a dplyr issue. 实际上,这不是dplyr问题。

As others mentioned, a == b checks whether each pair of elements is identical, ie a[1] == b[1] , a[2] == b[2] , and so on. 正如其他提到的, a == b检查每对元素是否相同,即a[1] == b[1]a[2] == b[2] ,依此类推。 (Take a look at ?Comparison .) You're comparing vectors of unequal lengths and with lengths that don't lend themselves to recycling one to fit the other, which is the reason for the warning you got. (看一下?Comparison 。)您正在比较长度不相等的向量和长度不适合于循环使用一个向量以适应另一个向量的向量,这就是发出警告的原因。

Instead, a %in% b checks whether each element in a exists somewhere in b , and returns true or false for each element in a . 相反, a %in% b检查是否在每个元素a存在于某处b ,以及每个元素返回true或false a

To illustrate with your data: 为了说明您的数据:

library(dplyr)

a <- 1:20
b <- rep(c("Blue", "Green", "White", "Grey"),5)
df <- data.frame(Numbers=a, colours=b)

In the a %in% b representation, this is your b : a %in% b表示中,这是您的b

levels(df$colours)[1:3]
#> [1] "Blue"  "Green" "Grey"

Checking for each element of colours being in that set of values yields a logical vector: 检查该值集中的每个colours元素会产生一个逻辑向量:

df$colours %in% levels(df$colours)[1:3]
#>  [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
#> [12]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE

The base R version of dplyr::filter is like this, taking the elements of df$colours for which the previous operation yields TRUE : dplyr::filter的基本R版本是这样的,它采用df$colours的元素,先前的操作会得出TRUE

df$colours[df$colours %in% levels(df$colours)[1:3]]
#>  [1] Blue  Green Grey  Blue  Green Grey  Blue  Green Grey  Blue  Green
#> [12] Grey  Blue  Green Grey 
#> Levels: Blue Green Grey White

In dplyr , non-standard evaluation drops the need for df$ , but you're doing essentially the same thing within dplyr::filter : finding whether each element of colours is in the subset of values levels(colours)[1:3] , and then filtering for only those rows corresponding to a TRUE . dplyr ,非标准评估会减少对df$的需求,但您在dplyr::filter所做的事情基本上相同:查找colours每个元素是否在值levels(colours)[1:3]的子集中,然后仅过滤与TRUE相对应的那些行。

df %>%
  filter(colours %in% levels(colours)[1:3])
#>    Numbers colours
#> 1        1    Blue
#> 2        2   Green
#> 3        4    Grey
#> 4        5    Blue
#> 5        6   Green
#> 6        8    Grey
#> 7        9    Blue
#> 8       10   Green
#> 9       12    Grey
#> 10      13    Blue
#> 11      14   Green
#> 12      16    Grey
#> 13      17    Blue
#> 14      18   Green
#> 15      20    Grey

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM