如何确定等于 r 中某个值的列的长度？

Question

I'm trying to find how many data points are present for each of my categorical variables in the column genotype.我试图找出列基因型中每个分类变量存在多少数据点。 So far the following code returns the same values when the first line should return a value roughly 1/3 of the lower line of code.到目前为止，当第一行应该返回大约是下一行代码的 1/3 时，以下代码返回相同的值。

length(CYP$Genotype == "CYP1B1 KO")
length(CYP$Genotype)

Answer 1

As mentioned in the comments, you want to use sum instead of length to get the frequency of a variable.如评论中所述，您想使用sum而不是length来获取变量的频率。 If you use length on the condition, then it will return the number of items in the vector, which is 8 in this case.如果在条件上使用length ，那么它将返回向量中的项目数，在这种情况下为 8。

CYP$Genotype == "CYP1B1 KO"
#[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

length(CYP$Genotype == "CYP1B1 KO")
# [1] 8

Instead, if you use sum , then it will count the number of TRUE statements (which are counted as 1s, whereas, FALSE is a 0).相反，如果您使用sum ，那么它将计算TRUE语句的数量（计为 1，而FALSE为 0）。

sum(CYP$Genotype == "CYP1B1 KO")
# [1] 4

As mentioned by @dcarlson, you can use table to get the frequency of the different values in the column, which you could put back into a dataframe.正如@dcarlson 所提到的，您可以使用table来获取列中不同值的频率，您可以将其放回数据框中。

data.frame(n = cbind(table(CYP$Genotype)))
#          n
#CYP1B1 KO 4
#GRB3C2 F2 1
#RGB2B1 G1 3

Or you can use count from dplyr :或者您可以使用dplyr中的count ：

library(dplyr)

CYP %>% 
  count(Genotype)

#   Genotype n
#1 CYP1B1 KO 4
#2 GRB3C2 F2 1
#3 RGB2B1 G1 3

Data数据

CYP <- structure(list(Genotype = c("CYP1B1 KO", "CYP1B1 KO", "CYP1B1 KO", 
"CYP1B1 KO", "RGB2B1 G1", "RGB2B1 G1", "RGB2B1 G1", "GRB3C2 F2"
)), class = "data.frame", row.names = c(NA, -8L))

如何确定等于 r 中某个值的列的长度？

问题描述

1 个解决方案

解决方案1
0 2022-06-16 03:48:05

如何确定等于 r 中某个值的列的长度？

问题描述

1 个解决方案

解决方案1 0 2022-06-16 03:48:05

解决方案1
0 2022-06-16 03:48:05