[英]Summarize counts based on multiple conditions
I am trying to get a summary of my data based on combinations of two variables.我试图根据两个变量的组合来总结我的数据。 The following code used to work on the data:
以下代码用于处理数据:
df <- data_frame(fc = runif(1000, -5, 5),
padj = runif(1000, 0, 1))
df %>%
summarise(
dn_red = count(fc < -1.5, padj <= 0.1),
dn_pink = count(fc < -1.5, padj >= 0.1),
dn_blue = count(fc>-1.5 & fc< 0, padj <= 0.1),
dn_grey = count(fc>-1.5 & fc< 0, padj >= 0.1),
up_red = count(fc > 1.5, padj <= 0.1),
up_pink = count(fc > 1.5, padj >= 0.1),
up_blue = count(fc < 1.5 & fc > 0, padj <= 0.1),
up_grey = count(fc < 1.5 & fc > 0, padj >= 0.1)
)
Running it after a couple of months since writing it throws the following error:在编写它几个月后运行它会引发以下错误:
Error: Problem with `summarise()` input `dn_red`.
x no applicable method for 'count' applied to an object of class "logical"
ℹ Input `dn_red` is `count(fc < -1.5, padj <= 0.1)`.
I can see that count outputs a tibble with logical vectors corresponding to the conditions.我可以看到 count 输出一个带有与条件相对应的逻辑向量的小标题。 What I am trying to get out of it is a summary of the counts, where both the conditions are TRUE.
我试图从中得到的是计数的摘要,其中两个条件都为真。 The code above used to do just that...
上面的代码曾经这样做......
You perhaps want sum
instead of count
!您可能想要
sum
而不是count
!
set.seed(1)
df <- data.frame(fc = runif(1000, -5, 5),
padj = runif(1000, 0, 1))
df %>%
summarise(
dn_red = sum(fc < -1.5, padj <= 0.1),
dn_pink = sum(fc < -1.5, padj >= 0.1),
dn_blue = sum(fc>-1.5 & fc< 0, padj <= 0.1),
dn_grey = sum(fc>-1.5 & fc< 0, padj >= 0.1),
up_red = sum(fc > 1.5, padj <= 0.1),
up_pink = sum(fc > 1.5, padj >= 0.1),
up_blue = sum(fc < 1.5 & fc > 0, padj <= 0.1),
up_grey = sum(fc < 1.5 & fc > 0, padj >= 0.1)
)
dn_red dn_pink dn_blue dn_grey up_red up_pink up_blue up_grey
1 494 1250 269 1025 458 1214 267 1023
But this is creating overlaps.但这会造成重叠。 So you need to replace
,
within logical conditions with either &
or |
因此,您需要在逻辑条件下用
&
或|
替换,
as the case may be.视情况可以是。 See.
看。
df %>%
summarise(
dn_red = sum(fc < -1.5 & padj <= 0.1),
dn_pink = sum(fc < -1.5 & padj >= 0.1),
dn_blue = sum(fc>-1.5 & fc< 0 & padj <= 0.1),
dn_grey = sum(fc>-1.5 & fc< 0 & padj >= 0.1),
up_red = sum(fc > 1.5 & padj <= 0.1),
up_pink = sum(fc > 1.5 & padj >= 0.1),
up_blue = sum(fc < 1.5 & fc > 0 & padj <= 0.1),
up_grey = sum(fc < 1.5 & fc > 0 & padj >= 0.1)
)
dn_red dn_pink dn_blue dn_grey up_red up_pink up_blue up_grey
1 44 328 20 127 40 296 18 127
If this is what you expected, then it is advisable to divide 1000
data points into eight colors.如果这是您所期望的,那么建议将
1000
数据点分成 8 个 colors。 Use this code instead请改用此代码
df %>% mutate(new = case_when(
fc < -1.5 & padj <= 0.1 ~ 'dn_red',
fc < -1.5 & padj >= 0.1 ~ 'dn_pink',
fc > -1.5 & fc < 0 & padj <= 0.1 ~ 'dn_blue',
fc > -1.5 & fc < 0 & padj >= 0.1 ~'dn_grey',
fc > 1.5 & padj <= 0.1 ~ 'up_red',
fc > 1.5 & padj >= 0.1 ~ 'up_pink',
fc < 1.5 & fc > 0 & padj <= 0.1 ~ 'up_blue',
fc < 1.5 & fc > 0 & padj >= 0.1 ~ 'up_grey',
TRUE ~ 'others'
)) %>% count(new)
new n
1 dn_blue 20
2 dn_grey 127
3 dn_pink 328
4 dn_red 44
5 up_blue 18
6 up_grey 127
7 up_pink 296
8 up_red 40
or better use janitor
to have a frequency count或更好地使用
janitor
进行频率计数
df %>% mutate(new = case_when(
fc < -1.5 & padj <= 0.1 ~ 'dn_red',
fc < -1.5 & padj >= 0.1 ~ 'dn_pink',
fc > -1.5 & fc < 0 & padj <= 0.1 ~ 'dn_blue',
fc > -1.5 & fc < 0 & padj >= 0.1 ~'dn_grey',
fc > 1.5 & padj <= 0.1 ~ 'up_red',
fc > 1.5 & padj >= 0.1 ~ 'up_pink',
fc < 1.5 & fc > 0 & padj <= 0.1 ~ 'up_blue',
fc < 1.5 & fc > 0 & padj >= 0.1 ~ 'up_grey',
TRUE ~ 'others'
)) %>% janitor::tabyl(new) %>%
janitor::adorn_totals()
new n percent
dn_blue 20 0.020
dn_grey 127 0.127
dn_pink 328 0.328
dn_red 44 0.044
up_blue 18 0.018
up_grey 127 0.127
up_pink 296 0.296
up_red 40 0.040
Total 1000 1.000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.