[英]How to count number of observations in each column in a grouped dataframe in r
I have a dataframe composed of geochemical samples result which includes the following variables:我有一个由地球化学样本结果组成的数据框,其中包括以下变量:
Year, Zone, *48 analyzed elements*, *more information*.
I want to know how many samples were collected each year, in each zone for every elements.我想知道每年在每个区域为每个元素收集了多少样本。 So basically, I would like a table that would look like this:
所以基本上,我想要一个看起来像这样的表:
Year,Zone,Ag_ppm, ..., Zr_ppm
1981, ZoneA, 0, ..., 0
1981, ZoneB, 20, ..., 0
1983, ZoneA, 0, ..., 150
I have tried the following:我尝试了以下方法:
Elt_count <- SoilGeology %>%
group_by(Year, Zone) %>%
summarise_at(vars(Ag_ppm:Zr_ppm),funs(sum)) %>%
select(Year, Zone, Ag_ppm:Zr_ppm)
It works but it does not give me the information I want (I don't want the cummulative sum of the samples , but a count of every samples).它有效,但它没有给我我想要的信息(我不想要样本的累积总和,而是每个样本的计数)。 I have then tried:
然后我尝试过:
Elt_count <- SoilGeology %>%
group_by(Year, Zone) %>%
summarise_at(vars(Ag_ppm:Zr_ppm),funs(n)) %>%
select(Year, Zone, Ag_ppm:Zr_ppm)
But I get the following error: Error in summarise_impl(.data, dots) :
n() does not take arguments
但我收到以下错误:
Error in summarise_impl(.data, dots) :
n() does not take arguments
I have also tried:我也试过:
d <- SoilGeology %>%
group_by(Year, Zone) %>%
summarise_all(n) %>%
select(Year, Zone, Ag_ppm:Zr_ppm)
But I get the same error as above: Error in summarise_impl(.data, dots) :
n() does not take arguments
但我得到了与上面相同的错误:
Error in summarise_impl(.data, dots) :
n() does not take arguments
And also tried with count:并且还尝试了计数:
Elt_count <- SoilGeology %>%
group_by(Year, Zone) %>%
count(Au_ppm:Zr_ppm, na.rm = TRUE) %>%
select(Year, Zone, Ag_ppm:Zr_ppm)
But, I get the error:但是,我收到错误:
Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.
In addition: Warning messages:
1: In Au_ppm:Zr_ppm :
numerical expression has 52 elements: only the first used
2: In Au_ppm:Zr_ppm :
numerical expression has 52 elements: only the first used
Does someone has an explanation for these errors?有人对这些错误有解释吗? Or a better solution for my problem?
或者对我的问题有更好的解决方案?
Thanks!谢谢!
Perhaps the following code is what you desire.也许下面的代码是你想要的。
library(dplyr) count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)
Or equivalently with the pipe function或等效于管道功能
SoilGeology %>% count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)
Alternately,交替,
SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% summarise(number = n())
Or或者
SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% tally()
If errors persist, the 'class()' of your variables should be checked.如果错误仍然存在,则应检查变量的“class()”。 Values may need to be coerced to numeric.
值可能需要强制转换为数字。 If needed, try
variable -> as.numeric(variable)
and try again.如果需要,请尝试
variable -> as.numeric(variable)
试。
sum
adds numbers, n()
and count()
count rows. sum
添加数字, n()
和count()
计数行。 If numbers greater than 0 have special meaning for you, you need to tell R that.如果大于 0 的数字对您有特殊意义,您需要告诉 R。 The classic way to count the number of things meeting a condition is
sum(..test for condition..)
, so if you want the number of elements of x
that are greater than 0
, sum(x > 0)
will do it.计算满足条件的事物数量的经典方法是
sum(..test for condition..)
,所以如果你想要x
大于0
的元素数量, sum(x > 0)
就可以了。 This is the function you want to apply to all columns:这是您要应用于所有列的函数:
# reproducible example on built-in data
mtcars %>%
group_by(cyl) %>%
summarize_at(vars(disp:carb), function(x) sum(x > 5))
# for your data
Elt_count <- SoilGeology %>%
group_by(Year, Zone) %>%
summarise_at(vars(Ag_ppm:Zr_ppm), function(x) sum(x > 0))
I don't know your data.我不知道你的数据。 You may want to change it to
sum(x != 0)
if there are negative numbers you want to count too.如果您也想计算负数,您可能希望将其更改为
sum(x != 0)
。 If there are missing values, sum(x > 0, na.rm = TRUE)
(if you look at ?sum
, it does take a na.rm
argument).如果有缺失值,
sum(x > 0, na.rm = TRUE)
(如果你看?sum
,它确实需要一个na.rm
参数)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.