简体   繁体   English

R中的数据帧:计算数据帧中行的平均值,而忽略值为'0'的条目

[英]Data frames in R: Calculating average of rows in a data frame while ignoring entries with '0' values

Let's say in the R environment, I have this data frame with n rows: 假设在R环境中,我的数据帧包含n行:

a b c classes
1 2 0  a
0 0 2  b
0 1 0  c

The result that I am looking for is: 1. Get the number of non-zero values in each row 我正在寻找的结果是:1.获取每一行中非零值的数量

size_of_a = 2
average_of_a = 1.5

size_of_b= 1
average_of_b= 2
.
the same for the other rows

I have tried rowSums(dt[-c(4)]!=0) for finding the non zero elements, but I can't be sure that the 'classes column' will be the 4th column. 我已经尝试使用rowSums(dt[-c(4)]!=0)来查找非零元素,但是我不确定“ classes列”将是第4列。

I would appreciate your help with acquiring these results. 感谢您在获得这些结果方面的帮助。 Thanks 谢谢

You can do it with 你可以做到

# Generate some fake data
set.seed(1)
n = 10
k = 5
x = matrix(runif(n * k), n, k)
x[x < 0.5] = 0

# Get number of nonzero entries in each row
nonzeros = apply(x, 1, function(z) sum(z != 0))

# Take row sums and divide by number of non-zero entries
rowSums(x) / nonzeros

Or, using the data.frame you provided, it would look like this 或者,使用您提供的data.frame,它看起来像这样

# The data
x = structure(list(a = c(1L, 0L, 0L), b = c(2L, 0L, 1L), c = c(0L,
    2L, 0L), classes = structure(1:3, .Label = c("a", "b", "c"), class = "factor")), .Names = c("a",
    "b", "c", "classes"), class = "data.frame", row.names = c(NA,
    -3L))

column = which(names(x) == "classes")
nonzeros = apply(x[-column], 1, function(z) sum(z != 0))
rowSums(x[-column]) / nonzeros

First, I create the data frame. 首先,我创建数据框。

df <- read.table(text = "a b c classes
1 2 0  a
0 0 2  b
0 1 0  c", header = TRUE)

Then, I replace zeros with NA s to make life easier, since functions often have na.rm to ignore them. 然后,我用NA替换零以使生活更轻松,因为函数通常具有na.rm来忽略它们。

df[df==0] <- NA

Finally, I bind together the sum of non-zero elements, the mean values, and the class names into a data frame. 最后,我将非零元素的总和,平均值和类名称绑定到一个数据帧中。

data.frame(classes = df[,4], 
           size = rowSums(df[, -4]>0, na.rm = TRUE), 
           mean = rowMeans(df[, -4], na.rm = TRUE))

which gives, 这使,

#   classes size mean
# 1       a    2  1.5
# 2       b    1  2.0
# 3       c    1  1.0

Edit 编辑

data.frame(classes = df[,"classes"], 
           size = rowSums(df[, names(df) != "classes"]>0, na.rm = TRUE), 
           mean = rowMeans(df[, names(df) != "classes"], na.rm = TRUE))

#   classes size mean
# 1       a    2  1.5
# 2       b    1  2.0
# 3       c    1  1.0

Another syntax to create dataframe using tibble function from dplyr library: 另一种语法用于创建数据帧tibble功能从dplyr库:

library(dplyr)
df <- 
  tibble(
  a = c(1,0,0), 
  b = c(2,0,1),
  c = c(0,2,0), 
  classes = c("a", "b", "c")
  )

To count the elements in a row that are equal to zero, you can evaluate the whole row even when column classes is not numeric 要计算一行中等于零的元素,即使列classes不是数字,也可以评估整行

rowSums( df == 0 )

Conversely, the number of elements different from zero in the whole row can be calculated through rowSums( df != 0 ) . 相反,可以通过rowSums( df != 0 )计算整行中不为零的元素 Therefore, the average you are looking for is: 因此,您要寻找的平均值是:

rowSums( df[ , 1:3] )/rowSums( df[ ,1:3] != 0 ) 

Cheers! 干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM