[英]R code for substituting values from one column to the missing values in another column
I have a data set named one
with four columns: D1
, D2
, D3
and D4
. 我有一个名为one
的数据集,有四列: D1
, D2
, D3
和D4
。 D1
is the id. D1
是id。 D2
has seven levels ( a
, b
, c
, d
, e
, f
, g
). D2
有七个等级( a
, b
, c
, d
, e
, f
, g
)。 D3
has missing data, which I want to fill by matching conditions from columns D2
and D4
. D3
缺少数据,我想通过匹配D2
和D4
列的条件来填充数据。 I am selecting values from column D4
corresponding to four levels ( a
, c
, d
, e
) of column D2
and then replacing the missing values of column D3
with those from D4
. 我从列D4
选择对应于列D2
四个级别( a
, c
, d
, e
)的值,然后将列D3
的缺失值替换为来自D4
。
D1 D2 D3 D4
1 a . 5
2 c 12 6
3 e . 3
4 b . 7
5 f . 8
6 e . 9
7 e 11 8
8 c . 3
9 c 52 5
10 a . 6
11 b 4 7
12 f . 2
13 f . 10
14 d . 12
15 d . 13
16 e . 24
17 a 1 54
18 b 2 19
19 c 5 21
I have following solution but it is not working. 我有以下解决方案,但它无法正常工作。 Any suggestion or help? 有任何建议或帮助吗? Thanks. 谢谢。
index <- with(one, D2 %in% c('a','c','d','e'))
one$D4[index] <- one$D3[index]
one
Assuming that you actually do have "." 假设你确实有“。” in the data, and that the data are read in as characters instead of numbers/NAs, the following solution should be easier to understand than the with() call: 在数据中,并且数据作为字符而不是数字/ NA读入,以下解决方案应该比with()调用更容易理解:
d <- read.table(header=T, stringsAsFactors=F, text=
"D1 D2 D3 D4
1 a . 5
2 c 12 6
3 e . 3
4 b . 7
5 f . 8
6 e . 9
7 e 11 8
8 c . 3
9 c 52 5
10 a . 6
11 b 4 7
12 f . 2
13 f . 10
14 d . 12
15 d . 13
16 e . 24
17 a 1 54
18 b 2 19
19 c 5 21"
)
indices <- d$D2 %in% c("a","c","d","e") & d$D3 == "."
d$D3[ indices ] <- d$D4[ indices ]
And if you actually do have NAs instead of the "." 如果你确实有NA而不是“。” characters you could easily just use is.na(d$D3)
as your vector indices. 您可以轻松使用的is.na(d$D3)
作为矢量索引。
Another way is to use na.strings
when reading the table and then using ifelse
. 另一种方法是在读表时使用na.strings
,然后使用ifelse
。 Slightly verbose but easy to understand ! 略显冗长但易于理解!
d <- read.table(header=T, stringsAsFactors=F, na.strings=".", text=
"D1 D2 D3 D4
1 a . 5
2 c 12 6
3 e . 3
4 b . 7
5 f . 8
6 e . 9
7 e 11 8
8 c . 3
9 c 52 5
10 a . 6
11 b 4 7
12 f . 2
13 f . 10
14 d . 12
15 d . 13
16 e . 24
17 a 1 54
18 b 2 19
19 c 5 21"
)
d$D3 <- ifelse(is.na(d$D3) & (d$D2 == 'a' | d$D2 == 'c' | d$D2 == 'd' | d$D2 == 'e'), d$D4, d$D3)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.