按因子列计算data.frame组中的变量

Question

I have a data.frame contain numeric columns, these columns have factor levels that I want to impute missing values by...let me explain. 我有一个data.frame包含数字列，这些列具有我希望通过...来计算缺失值的因子级别...让我解释一下。

part   id   value
a      1     23.4
a      2     23.8
a      3     45.6
a      4     34.7
a      5     Na
b      1     45.2
b      2     34.6
b      3     Na
b      4     30.9
b      5     28.1

Id like to impute the NA values with the mean of the part. 我想用部件的平均值来估算NA值。 So for part a, I'd like to impute the id 5 missing value with the mean of ids 1-4 in part a, and same for part b, impute missing id3 with the mean of ids in part b etc. 因此，对于a部分，我想将id 5缺失值与part a中的id 1-4的平均值相比较，并且对于b部分相同，将缺少的id3与b部分中的id的平均值相等。

I need to do this across many columns (imagine having many more value columns). 我需要在许多列中执行此操作（想象有更多的值列）。 So perhaps an apply with a function etc. 所以也许适用于功能等。

Answer 1

Using na.strings argument in read.table/read.csv we can convert the missing values to real NA and thereby reading the 'value' columns as 'numeric'. 在read.table/read.csv使用na.strings参数，我们可以将缺失值转换为实际NA ，从而将'value'列读为'numeric'。 With dplyr , we can change replace the NAs in multiple value columns with mean of that column. 使用dplyr ，我们可以replace该列的mean更改多个值列中的NAs 。

library(dplyr)
df1 %>%
    group_by(part) %>%
    mutate_each(funs(replace(., which(is.na(.)), mean(., na.rm=TRUE))), 
       starts_with('value'))

Or a similar option with data.table 或者data.table的类似选项

library(data.table)
nm1 <- grep('value', names(df1))
setDT(df1)[, (nm1) := lapply(.SD,  function(x) replace(x,
     which(is.na(x)), mean(x, na.rm=TRUE))), by = part,.SDcols=nm1]

data 数据

df1 <- read.table(text="part   id   value
a      1     23.4
a      2     23.8
a      3     45.6
a      4     34.7
a      5     Na
b      1     45.2
b      2     34.6
b      3     Na
b      4     30.9
b      5     28.1", header=TRUE, na.strings="Na", stringsAsFactors=FALSE)

按因子列计算data.frame组中的变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-04 07:31:10

data 数据

按因子列计算data.frame组中的变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-04 07:31:10

data 数据

解决方案1
2 已采纳 2015-06-04 07:31:10