[英]How to bin data based on values in one column, and sum occurrences from another column in R?
I have a dataframe df and want to bin rows using data from column A, and then for each bin, count the number of times that a value is present in another column B. Here is an example using only 2 columns (although my real example has many columns): 我有一个数据框df,想要使用来自列A的数据来对行进行装箱,然后为每个箱计数一个值在另一个列B中的出现次数。这是仅使用2列的示例(尽管我的实际示例有很多列):
A B
5.4
4.6 36_8365
2.4
3.6
0.6
8.9 83_7433
4
7.6
4.7 54_3874
1.5 54_8364
I want look in column A, and find all values less than 1, greater than 1 but less than 2, and so on, and for each bin, I want to count the number of times that a value appears in column B. For the table above, this would give the following results: 我想查看A列,找到所有小于1的值,大于1但小于2,依此类推,对于每个bin,我想计算一个值在B列中出现的次数。上表,这将产生以下结果:
Class Number
<1 0
1<=A<2 1
2<=A<3 0
3<=A<4 0
4<=A<5 2
5<=A<6 0
6<=A<7 0
7<=A<8 0
8<=A<9 1
9<=A<10 0
The following is close, but it will sum the values when instead I just want to count them: 以下内容很接近,但是当我只想计算它们时,它将对这些值求和:
with(df, sum(df[A >= 1 & A < 2, "B"]))
I'm not sure what to replace "sum" with to get just counts, instead of a sum. 我不确定用“ sum”代替什么来获得计数,而不是总和。 I know I can identify which rows in column B have a value by using
我知道我可以通过使用来识别B列中的哪些行具有值
thing <- B==''
or make a table using 或使用
thing_table <- table(B=='')
However, I'm not sure how to search through column A, test if the value is between 2 other values, and then count the items in B that meet those criteria. 但是,我不确定如何搜索A列,测试该值是否在其他2个值之间,然后计算B中符合这些条件的项目。 Can anyone point me in the right direction?
谁能指出我正确的方向?
Thanks! 谢谢!
First: 第一:
newdf<-na.omit(df) newdf <-na.omit(df)
This will shrink the df down to only rows with data in them. 这会将df缩小到仅包含数据的行。 Make sure the empty cells are showing up as NAs before attempting.
尝试之前,请确保空单元格显示为NA。
Second: 第二:
Replace sum with length 用长度替换总和
with(newdf, length(newdf[A>=1 $ A < 2, "B"])) with(newdf,length(newdf [A> = 1 $ A <2,“ B”]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.