[英]Data frames in R: Calculating average of rows in a data frame while ignoring entries with '0' values
Let's say in the R environment, I have this data frame with n rows: 假设在R环境中,我的数据帧包含n行:
a b c classes
1 2 0 a
0 0 2 b
0 1 0 c
The result that I am looking for is: 1. Get the number of non-zero values in each row 我正在寻找的结果是:1.获取每一行中非零值的数量
size_of_a = 2
average_of_a = 1.5
size_of_b= 1
average_of_b= 2
.
the same for the other rows
I have tried rowSums(dt[-c(4)]!=0)
for finding the non zero elements, but I can't be sure that the 'classes column' will be the 4th column. 我已经尝试使用rowSums(dt[-c(4)]!=0)
来查找非零元素,但是我不确定“ classes列”将是第4列。
I would appreciate your help with acquiring these results. 感谢您在获得这些结果方面的帮助。 Thanks 谢谢
You can do it with 你可以做到
# Generate some fake data
set.seed(1)
n = 10
k = 5
x = matrix(runif(n * k), n, k)
x[x < 0.5] = 0
# Get number of nonzero entries in each row
nonzeros = apply(x, 1, function(z) sum(z != 0))
# Take row sums and divide by number of non-zero entries
rowSums(x) / nonzeros
Or, using the data.frame you provided, it would look like this 或者,使用您提供的data.frame,它看起来像这样
# The data
x = structure(list(a = c(1L, 0L, 0L), b = c(2L, 0L, 1L), c = c(0L,
2L, 0L), classes = structure(1:3, .Label = c("a", "b", "c"), class = "factor")), .Names = c("a",
"b", "c", "classes"), class = "data.frame", row.names = c(NA,
-3L))
column = which(names(x) == "classes")
nonzeros = apply(x[-column], 1, function(z) sum(z != 0))
rowSums(x[-column]) / nonzeros
First, I create the data frame. 首先,我创建数据框。
df <- read.table(text = "a b c classes
1 2 0 a
0 0 2 b
0 1 0 c", header = TRUE)
Then, I replace zeros with NA
s to make life easier, since functions often have na.rm
to ignore them. 然后,我用NA
替换零以使生活更轻松,因为函数通常具有na.rm
来忽略它们。
df[df==0] <- NA
Finally, I bind together the sum of non-zero elements, the mean values, and the class names into a data frame. 最后,我将非零元素的总和,平均值和类名称绑定到一个数据帧中。
data.frame(classes = df[,4],
size = rowSums(df[, -4]>0, na.rm = TRUE),
mean = rowMeans(df[, -4], na.rm = TRUE))
which gives, 这使,
# classes size mean
# 1 a 2 1.5
# 2 b 1 2.0
# 3 c 1 1.0
data.frame(classes = df[,"classes"],
size = rowSums(df[, names(df) != "classes"]>0, na.rm = TRUE),
mean = rowMeans(df[, names(df) != "classes"], na.rm = TRUE))
# classes size mean
# 1 a 2 1.5
# 2 b 1 2.0
# 3 c 1 1.0
Another syntax to create dataframe using tibble
function from dplyr
library: 另一种语法用于创建数据帧tibble
功能从dplyr
库:
library(dplyr)
df <-
tibble(
a = c(1,0,0),
b = c(2,0,1),
c = c(0,2,0),
classes = c("a", "b", "c")
)
To count the elements in a row that are equal to zero, you can evaluate the whole row even when column classes
is not numeric 要计算一行中等于零的元素,即使列classes
不是数字,也可以评估整行
rowSums( df == 0 )
Conversely, the number of elements different from zero in the whole row can be calculated through rowSums( df != 0 )
. 相反,可以通过rowSums( df != 0 )
计算整行中不为零的元素数 。 Therefore, the average you are looking for is: 因此,您要寻找的平均值是:
rowSums( df[ , 1:3] )/rowSums( df[ ,1:3] != 0 )
Cheers! 干杯!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.