简体   繁体   English

在R中使用paste()替换NA

[英]Replacing NA using paste() in R

My data frame has 9 columns and 3198 rows. 我的数据框有9列和3198行。 In one of the columns, 'NA' is repeated every 82 rows. 在其中一列中,“ NA”每82行重复一次。 I am trying to replace the missing values with values from other columns in the same data frame. 我正在尝试将缺失的值替换为同一数据框中其他列的值。 Let me illustrate with a small dataset: 让我用一个小的数据集来说明:

df <- data.frame(a = LETTERS[1:6], b = rep(seq(1:3), 2))

df$b[1] <- 'NA'
df$b[4] <- 'NA'

> df
  a  b
1 A NA
2 B  2
3 C  3
4 D NA
5 E  2
6 F  3

for (i in 1: nrow(df)){
        if ('NA' %in% df$b[i]){

                df$b[i] <- paste("box", df$a[i])
        }
}

> df
  a     b
1 A box A
2 B     2
3 C     3
4 D box D
5 E     2
6 F     3

The code works in this small dataset. 该代码可在此小型数据集中使用。 I am doing exactly the same thing in my larger dataset but for some reason the missing values are still not getting replaced. 我在更大的数据集中做着完全相同的事情,但是由于某些原因,丢失的值仍然没有被替换。 Any idea what might be going on? 知道会发生什么吗? This is probably an odd question given my code works in the sample dataset here and I cannot post the actual dataset for your review. 鉴于我的代码可在此处的示例数据集中使用,并且我无法发布实际的数据集供您查看,因此这可能是一个奇怪的问题。 The following might prove helpful: 以下内容可能会有所帮助:

> str(dataset)
'data.frame':        3198 obs. of  8 variables:
$ Local Identifier         : chr  "NEZ0100" "NEZ0100-1" "NEZ0100-2" "NEZ0100-3" ...
$ Local System             : chr  "Freezerworks" "Freezerworks" "Freezerworks" "Freezerworks" ...
$ Parent ID                : chr  "NEZ0100" "NEZ0100" "NEZ0100" "NEZ0100" ...
$ Storage Type             : chr  "Box-9X9" "BoxPos" "BoxPos" "BoxPos" ...
$ Storage Label            : chr  NA "A1" "A2" "A3" ...
$ Capacity                 : int  81 1 1 1 1 1 1 1 1 1 ...
$ Movable                  : chr  "Y" "N" "N" "N" ...
$ Storage Unit Order Number: int  0 1 2 3 4 5 6 7 8 9 ...

The problem occurs in $ Storage Label. 该问题出现在$ Storage Label中。 Please let me know if you need any additional info. 如果您需要任何其他信息,请告诉我。 Thanks. 谢谢。

R uses NA for missing values. R使用NA表示缺失值。 Note that NA is a special value, different from the character value "NA" . 注意NA是一个特殊值,不同于字符值"NA" Since your str(dataset) shows that the NA value there is not in quotes, we know it's R's special NA value rather than a string. 由于您的str(dataset)显示NA值不带引号,因此我们知道它是R的特殊NA值,而不是字符串。 So for your example, it's really more like 因此,对于您的示例,它实际上更像

df <- data.frame(a = LETTERS[1:6], b = rep(seq(1:3), 2))
df$b[1] <- NA
df$b[4] <- NA

You test for NA using is.na() rather than =='NA' . 您测试NA使用is.na()而不是=='NA' Also, we won't need any loops here to replace the NA values, we can just do 另外,我们这里不需要任何循环来替换NA值,我们可以做

df$b[is.na(df$b)]<-paste("box", df$a[is.na(df$b)])
df

which gives us 这给了我们

  a     b
1 A box A
2 B     2
3 C     3
4 D box D
5 E     2
6 F     3

Note that using paste here will convert that column from numeric to character, but it looks like your actual "Storage Label" column is character anyway so that wont change anything. 请注意,在此处使用paste将把该列从数字转换为字符,但是看起来您的实际“存储标签”列还是字符,因此不会进行任何更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM