R中的数据帧：计算数据帧中行的平均值，而忽略值为'0'的条目

Question

Let's say in the R environment, I have this data frame with n rows: 假设在R环境中，我的数据帧包含n行：

a b c classes
1 2 0  a
0 0 2  b
0 1 0  c

The result that I am looking for is: 1. Get the number of non-zero values in each row 我正在寻找的结果是：1.获取每一行中非零值的数量

size_of_a = 2
average_of_a = 1.5

size_of_b= 1
average_of_b= 2
.
the same for the other rows

I have tried rowSums(dt[-c(4)]!=0) for finding the non zero elements, but I can't be sure that the 'classes column' will be the 4th column. 我已经尝试使用rowSums(dt[-c(4)]!=0)来查找非零元素，但是我不确定“ classes列”将是第4列。

I would appreciate your help with acquiring these results. 感谢您在获得这些结果方面的帮助。 Thanks 谢谢

Answer 1

You can do it with 你可以做到

# Generate some fake data
set.seed(1)
n = 10
k = 5
x = matrix(runif(n * k), n, k)
x[x < 0.5] = 0

# Get number of nonzero entries in each row
nonzeros = apply(x, 1, function(z) sum(z != 0))

# Take row sums and divide by number of non-zero entries
rowSums(x) / nonzeros

Or, using the data.frame you provided, it would look like this 或者，使用您提供的data.frame，它看起来像这样

# The data
x = structure(list(a = c(1L, 0L, 0L), b = c(2L, 0L, 1L), c = c(0L,
    2L, 0L), classes = structure(1:3, .Label = c("a", "b", "c"), class = "factor")), .Names = c("a",
    "b", "c", "classes"), class = "data.frame", row.names = c(NA,
    -3L))

column = which(names(x) == "classes")
nonzeros = apply(x[-column], 1, function(z) sum(z != 0))
rowSums(x[-column]) / nonzeros

Answer 2

First, I create the data frame. 首先，我创建数据框。

df <- read.table(text = "a b c classes
1 2 0  a
0 0 2  b
0 1 0  c", header = TRUE)

Then, I replace zeros with NA s to make life easier, since functions often have na.rm to ignore them. 然后，我用NA替换零以使生活更轻松，因为函数通常具有na.rm来忽略它们。

df[df==0] <- NA

Finally, I bind together the sum of non-zero elements, the mean values, and the class names into a data frame. 最后，我将非零元素的总和，平均值和类名称绑定到一个数据帧中。

data.frame(classes = df[,4], 
           size = rowSums(df[, -4]>0, na.rm = TRUE), 
           mean = rowMeans(df[, -4], na.rm = TRUE))

which gives, 这使，

#   classes size mean
# 1       a    2  1.5
# 2       b    1  2.0
# 3       c    1  1.0

Edit 编辑

data.frame(classes = df[,"classes"], 
           size = rowSums(df[, names(df) != "classes"]>0, na.rm = TRUE), 
           mean = rowMeans(df[, names(df) != "classes"], na.rm = TRUE))

#   classes size mean
# 1       a    2  1.5
# 2       b    1  2.0
# 3       c    1  1.0

Answer 3

Another syntax to create dataframe using tibble function from dplyr library: 另一种语法用于创建数据帧tibble功能从dplyr库：

library(dplyr)
df <- 
  tibble(
  a = c(1,0,0), 
  b = c(2,0,1),
  c = c(0,2,0), 
  classes = c("a", "b", "c")
  )

To count the elements in a row that are equal to zero, you can evaluate the whole row even when column classes is not numeric 要计算一行中等于零的元素，即使列classes不是数字，也可以评估整行

rowSums( df == 0 )

Conversely, the number of elements different from zero in the whole row can be calculated through rowSums( df != 0 ) . 相反，可以通过rowSums( df != 0 )计算整行中不为零的元素数。 Therefore, the average you are looking for is: 因此，您要寻找的平均值是：

rowSums( df[ , 1:3] )/rowSums( df[ ,1:3] != 0 )

Cheers! 干杯!

R中的数据帧：计算数据帧中行的平均值，而忽略值为'0'的条目

问题描述

3 个解决方案

解决方案1
0 2018-12-06 17:56:06

解决方案2
0 已采纳 2018-12-06 18:01:04

Edit 编辑

解决方案3
0 2018-12-06 18:18:10

R中的数据帧：计算数据帧中行的平均值，而忽略值为&#39;0&#39;的条目

问题描述

3 个解决方案

解决方案1 0 2018-12-06 17:56:06

解决方案2 0 已采纳 2018-12-06 18:01:04

Edit 编辑

解决方案3 0 2018-12-06 18:18:10

R中的数据帧：计算数据帧中行的平均值，而忽略值为'0'的条目

解决方案1
0 2018-12-06 17:56:06

解决方案2
0 已采纳 2018-12-06 18:01:04

解决方案3
0 2018-12-06 18:18:10