如何计算r中分组数据帧中每列中的观察次数

Question

I have a dataframe composed of geochemical samples result which includes the following variables:我有一个由地球化学样本结果组成的数据框，其中包括以下变量：

Year, Zone, *48 analyzed elements*, *more information*.

I want to know how many samples were collected each year, in each zone for every elements.我想知道每年在每个区域为每个元素收集了多少样本。 So basically, I would like a table that would look like this:所以基本上，我想要一个看起来像这样的表：

Year,Zone,Ag_ppm, ..., Zr_ppm
1981, ZoneA, 0, ..., 0 
1981, ZoneB, 20, ..., 0
1983, ZoneA, 0, ..., 150

I have tried the following:我尝试了以下方法：

 Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm),funs(sum)) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

It works but it does not give me the information I want (I don't want the cummulative sum of the samples , but a count of every samples).它有效，但它没有给我我想要的信息（我不想要样本的累积总和，而是每个样本的计数）。 I have then tried:然后我尝试过：

Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm),funs(n)) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But I get the following error: Error in summarise_impl(.data, dots) : n() does not take arguments但我收到以下错误： Error in summarise_impl(.data, dots) : n() does not take arguments

I have also tried:我也试过：

d <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_all(n) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But I get the same error as above: Error in summarise_impl(.data, dots) : n() does not take arguments但我得到了与上面相同的错误： Error in summarise_impl(.data, dots) : n() does not take arguments

And also tried with count:并且还尝试了计数：

Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  count(Au_ppm:Zr_ppm, na.rm = TRUE) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But, I get the error:但是，我收到错误：

Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.
In addition: Warning messages:
1: In Au_ppm:Zr_ppm :
  numerical expression has 52 elements: only the first used
2: In Au_ppm:Zr_ppm :
  numerical expression has 52 elements: only the first used

Does someone has an explanation for these errors?有人对这些错误有解释吗？ Or a better solution for my problem?或者对我的问题有更好的解决方案？

Thanks!谢谢！

Answer 1

Perhaps the following code is what you desire.也许下面的代码是你想要的。

library(dplyr) count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)

Or equivalently with the pipe function或等效于管道功能

SoilGeology %>% count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)

Alternately,交替，

SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% summarise(number = n())

Or或者

SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% tally()

If errors persist, the 'class()' of your variables should be checked.如果错误仍然存在，则应检查变量的“class()”。 Values may need to be coerced to numeric.值可能需要强制转换为数字。 If needed, try variable -> as.numeric(variable) and try again.如果需要，请尝试variable -> as.numeric(variable)试。

Answer 2

sum adds numbers, n() and count() count rows. sum添加数字， n()和count()计数行。 If numbers greater than 0 have special meaning for you, you need to tell R that.如果大于 0 的数字对您有特殊意义，您需要告诉 R。 The classic way to count the number of things meeting a condition is sum(..test for condition..) , so if you want the number of elements of x that are greater than 0 , sum(x > 0) will do it.计算满足条件的事物数量的经典方法是sum(..test for condition..) ，所以如果你想要x大于0的元素数量， sum(x > 0)就可以了。 This is the function you want to apply to all columns:这是您要应用于所有列的函数：

# reproducible example on built-in data
mtcars %>%
  group_by(cyl) %>%
  summarize_at(vars(disp:carb), function(x) sum(x > 5))

# for your data
Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm), function(x) sum(x > 0))

I don't know your data.我不知道你的数据。 You may want to change it to sum(x != 0) if there are negative numbers you want to count too.如果您也想计算负数，您可能希望将其更改为sum(x != 0) 。 If there are missing values, sum(x > 0, na.rm = TRUE) (if you look at ?sum , it does take a na.rm argument).如果有缺失值， sum(x > 0, na.rm = TRUE) （如果你看?sum ，它确实需要一个na.rm参数）。

如何计算r中分组数据帧中每列中的观察次数

问题描述

2 个解决方案

解决方案1
0 2019-02-26 19:07:27

解决方案2
0 已采纳 2019-02-26 19:17:23

如何计算r中分组数据帧中每列中的观察次数

问题描述

2 个解决方案

解决方案1 0 2019-02-26 19:07:27

解决方案2 0 已采纳 2019-02-26 19:17:23

解决方案1
0 2019-02-26 19:07:27

解决方案2
0 已采纳 2019-02-26 19:17:23