简体   繁体   English

用于将一列中的值替换为另一列中的缺失值的R代码

[英]R code for substituting values from one column to the missing values in another column

I have a data set named one with four columns: D1 , D2 , D3 and D4 . 我有一个名为one的数据集,有四列: D1D2D3D4 D1 is the id. D1是id。 D2 has seven levels ( a , b , c , d , e , f , g ). D2有七个等级( abcdefg )。 D3 has missing data, which I want to fill by matching conditions from columns D2 and D4 . D3缺少数据,我想通过匹配D2D4列的条件来填充数据。 I am selecting values from column D4 corresponding to four levels ( a , c , d , e ) of column D2 and then replacing the missing values of column D3 with those from D4 . 我从列D4选择对应于列D2四个级别( acde )的值,然后将列D3的缺失值替换为来自D4

D1  D2  D3  D4
1   a   .   5
2   c   12  6
3   e   .   3
4   b   .   7
5   f   .   8
6   e   .   9
7   e   11  8
8   c   .   3
9   c   52  5
10  a   .   6
11  b   4   7
12  f   .   2
13  f   .   10
14  d   .   12
15  d   .   13
16  e   .   24
17  a   1   54
18  b   2   19
19  c   5   21

I have following solution but it is not working. 我有以下解决方案,但它无法正常工作。 Any suggestion or help? 有任何建议或帮助吗? Thanks. 谢谢。

index <- with(one, D2 %in% c('a','c','d','e'))
one$D4[index] <- one$D3[index]
one

Assuming that you actually do have "." 假设你确实有“。” in the data, and that the data are read in as characters instead of numbers/NAs, the following solution should be easier to understand than the with() call: 在数据中,并且数据作为字符而不是数字/ NA读入,以下解决方案应该比with()调用更容易理解:

d <- read.table(header=T, stringsAsFactors=F, text=
"D1  D2  D3  D4
1   a   .   5
2   c   12  6
3   e   .   3
4   b   .   7
5   f   .   8
6   e   .   9
7   e   11  8
8   c   .   3
9   c   52  5
10  a   .   6
11  b   4   7
12  f   .   2
13  f   .   10
14  d   .   12
15  d   .   13
16  e   .   24
17  a   1   54
18  b   2   19
19  c   5   21"
)

indices <- d$D2 %in% c("a","c","d","e") & d$D3 == "."
d$D3[ indices ] <- d$D4[ indices ]

And if you actually do have NAs instead of the "." 如果你确实有NA而不是“。” characters you could easily just use is.na(d$D3) as your vector indices. 您可以轻松使用的is.na(d$D3)作为矢量索引。

Another way is to use na.strings when reading the table and then using ifelse . 另一种方法是在读表时使用na.strings ,然后使用ifelse Slightly verbose but easy to understand ! 略显冗长但易于理解!

d <- read.table(header=T, stringsAsFactors=F, na.strings=".", text=
                  "D1  D2  D3  D4
1   a   .   5
2   c   12  6
3   e   .   3
4   b   .   7
5   f   .   8
6   e   .   9
7   e   11  8
8   c   .   3
9   c   52  5
10  a   .   6
11  b   4   7
12  f   .   2
13  f   .   10
14  d   .   12
15  d   .   13
16  e   .   24
17  a   1   54
18  b   2   19
19  c   5   21"
)


d$D3 <- ifelse(is.na(d$D3) & (d$D2 == 'a' | d$D2 == 'c' | d$D2 == 'd' | d$D2 == 'e'), d$D4, d$D3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM