简体   繁体   English

如何计算r中分组数据帧中每列中的观察次数

[英]How to count number of observations in each column in a grouped dataframe in r

I have a dataframe composed of geochemical samples result which includes the following variables:我有一个由地球化学样本结果组成的数据框,其中包括以下变量:

Year, Zone, *48 analyzed elements*, *more information*.

I want to know how many samples were collected each year, in each zone for every elements.我想知道每年在每个区域为每个元素收集了多少样本。 So basically, I would like a table that would look like this:所以基本上,我想要一个看起来像这样的表:

Year,Zone,Ag_ppm, ..., Zr_ppm
1981, ZoneA, 0, ..., 0 
1981, ZoneB, 20, ..., 0
1983, ZoneA, 0, ..., 150 

I have tried the following:我尝试了以下方法:

 Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm),funs(sum)) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

It works but it does not give me the information I want (I don't want the cummulative sum of the samples , but a count of every samples).它有效,但它没有给我我想要的信息(我不想要样本的累积总和,而是每个样本的计数)。 I have then tried:然后我尝试过:

Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm),funs(n)) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But I get the following error: Error in summarise_impl(.data, dots) : n() does not take arguments但我收到以下错误: Error in summarise_impl(.data, dots) : n() does not take arguments

I have also tried:我也试过:

d <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_all(n) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But I get the same error as above: Error in summarise_impl(.data, dots) : n() does not take arguments但我得到了与上面相同的错误: Error in summarise_impl(.data, dots) : n() does not take arguments

And also tried with count:并且还尝试了计数:

Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  count(Au_ppm:Zr_ppm, na.rm = TRUE) %>%
  select(Year, Zone, Ag_ppm:Zr_ppm)

But, I get the error:但是,我收到错误:

Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.
In addition: Warning messages:
1: In Au_ppm:Zr_ppm :
  numerical expression has 52 elements: only the first used
2: In Au_ppm:Zr_ppm :
  numerical expression has 52 elements: only the first used

Does someone has an explanation for these errors?有人对这些错误有解释吗? Or a better solution for my problem?或者对我的问题有更好的解决方案?

Thanks!谢谢!

Perhaps the following code is what you desire.也许下面的代码是你想要的。

library(dplyr) count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)

Or equivalently with the pipe function或等效于管道功能

SoilGeology %>% count(SoilGeology, year, zone, Ag_ppm:Zr_ppm)

Alternately,交替,

SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% summarise(number = n())

Or或者

SoilGeology %>% group_by(year, zone, Ag_ppm:Zr_ppm) %>% tally()

If errors persist, the 'class()' of your variables should be checked.如果错误仍然存​​在,则应检查变量的“class()”。 Values may need to be coerced to numeric.值可能需要强制转换为数字。 If needed, try variable -> as.numeric(variable) and try again.如果需要,请尝试variable -> as.numeric(variable)试。

sum adds numbers, n() and count() count rows. sum添加数字, n()count()计数行。 If numbers greater than 0 have special meaning for you, you need to tell R that.如果大于 0 的数字对您有特殊意义,您需要告诉 R。 The classic way to count the number of things meeting a condition is sum(..test for condition..) , so if you want the number of elements of x that are greater than 0 , sum(x > 0) will do it.计算满足条件的事物数量的经典方法是sum(..test for condition..) ,所以如果你想要x大于0的元素数量, sum(x > 0)就可以了。 This is the function you want to apply to all columns:这是您要应用于所有列的函数:

# reproducible example on built-in data
mtcars %>%
  group_by(cyl) %>%
  summarize_at(vars(disp:carb), function(x) sum(x > 5))

# for your data
Elt_count <- SoilGeology %>%
  group_by(Year, Zone) %>%
  summarise_at(vars(Ag_ppm:Zr_ppm), function(x) sum(x > 0))

I don't know your data.我不知道你的数据。 You may want to change it to sum(x != 0) if there are negative numbers you want to count too.如果您也想计算负数,您可能希望将其更改为sum(x != 0) If there are missing values, sum(x > 0, na.rm = TRUE) (if you look at ?sum , it does take a na.rm argument).如果有缺失值, sum(x > 0, na.rm = TRUE) (如果你看?sum ,它确实需要一个na.rm参数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM