[英]How do I determine the length of a column equal to a certain value in r?
I'm trying to find how many data points are present for each of my categorical variables in the column genotype.我试图找出列基因型中每个分类变量存在多少数据点。 So far the following code returns the same values when the first line should return a value roughly 1/3 of the lower line of code.
到目前为止,当第一行应该返回大约是下一行代码的 1/3 时,以下代码返回相同的值。
length(CYP$Genotype == "CYP1B1 KO")
length(CYP$Genotype)
As mentioned in the comments, you want to use sum
instead of length
to get the frequency of a variable.如评论中所述,您想使用
sum
而不是length
来获取变量的频率。 If you use length
on the condition, then it will return the number of items in the vector, which is 8 in this case.如果在条件上使用
length
,那么它将返回向量中的项目数,在这种情况下为 8。
CYP$Genotype == "CYP1B1 KO"
#[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
length(CYP$Genotype == "CYP1B1 KO")
# [1] 8
Instead, if you use sum
, then it will count the number of TRUE
statements (which are counted as 1s, whereas, FALSE
is a 0).相反,如果您使用
sum
,那么它将计算TRUE
语句的数量(计为 1,而FALSE
为 0)。
sum(CYP$Genotype == "CYP1B1 KO")
# [1] 4
As mentioned by @dcarlson, you can use table
to get the frequency of the different values in the column, which you could put back into a dataframe.正如@dcarlson 所提到的,您可以使用
table
来获取列中不同值的频率,您可以将其放回数据框中。
data.frame(n = cbind(table(CYP$Genotype)))
# n
#CYP1B1 KO 4
#GRB3C2 F2 1
#RGB2B1 G1 3
Or you can use count
from dplyr
:或者您可以使用
dplyr
中的count
:
library(dplyr)
CYP %>%
count(Genotype)
# Genotype n
#1 CYP1B1 KO 4
#2 GRB3C2 F2 1
#3 RGB2B1 G1 3
Data数据
CYP <- structure(list(Genotype = c("CYP1B1 KO", "CYP1B1 KO", "CYP1B1 KO",
"CYP1B1 KO", "RGB2B1 G1", "RGB2B1 G1", "RGB2B1 G1", "GRB3C2 F2"
)), class = "data.frame", row.names = c(NA, -8L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.