简体   繁体   English

使用向量中的最后一个非空值填充空值

[英]Fill in empty values with the last non-empty in vector

I would like to fill in the missing values (not NA , just '' !) in a vector with the value before it. 我想在一个带有前面值的向量中填入缺失值(不是NA ,只是'' !!)。 For example, if I have a vector defined as 例如,如果我将矢量定义为

vec <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks', 
         'American Hustle', '')

my output vector would be 我的输出矢量将是

'Titanic', 'Titanic', 'Titanic', 'Titanic', 'Donnie Darko', 'Donnie Darko', 
'Donnie Darko', 'Twin Peaks', 'American Hustle', 'American Hustle'

How can I achieve this? 我怎样才能做到这一点?

Here is a two-liner with nzchar and subsetting that should be quite efficient. 这是一个带有nzchar和子集的双nzchar ,应该非常有效。

# get logical vector of elements with non-empty character elements
notMissings <- nzchar(movies)
# fill in missing values
movies[notMissings][cumsum(notMissings)]
 [1] "Titanic"         "Titanic"         "Titanic"         "Titanic"        
 [5] "Donnie Darko"    "Donnie Darko"    "Donnie Darko"    "Twin Peaks"     
 [9] "American Hustle" "American Hustle"

Here is a second method using rle . 这是使用rle的第二种方法。

# get run length encodings
temp <- rle(movies)
# get missing values    
missings <- nchar(temp$values) == 0
# fill in missing values
temp$values[missings] <- temp$values[which(missings) - 1]

# expand
inverse.rle(temp)
 [1] "Titanic"         "Titanic"         "Titanic"         "Titanic"        
 [5] "Donnie Darko"    "Donnie Darko"    "Donnie Darko"    "Twin Peaks"     
 [9] "American Hustle" "American Hustle"

Note that this second method will throw an error if the first element is the empty character, ''. 请注意,如果第一个元素是空字符'',则第二个方法将抛出错误。

data 数据

movies <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks',
            'American Hustle', '')

Using Reduce in base R where vec is your vector: 在基数R中使用Reduce ,其中vec是你的向量:

Reduce(function(x,y) ifelse(y=="", x, y), vec, accumulate=TRUE)

#[1] "Titanic"       "Titanic"       "Titanic"       "Titanic"         #"Donnie Darko"   
#[6] "Donnie Darko"  "Donnie Darko"  "Twin Peaks"    "American Hustle" #"American Hustle"

Or we can use na.locf from zoo : 或者我们可以使用zoo na.locf

library(zoo)
vec <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks', 'American Hustle', '')
vec[which(vec == "")] <- NA
na.locf(vec)

#  [1] "Titanic"         "Titanic"         "Titanic"         "Titanic" "Donnie Darko"    "Donnie Darko" 
#  [7] "Donnie Darko"    "Twin Peaks"      "American Hustle" "American Hustle"

We can also use 我们也可以使用

unlist(tapply(movies, cumsum(movies !=""), FUN = 
      function(x) rep(x[1], length(x))), use.names = FALSE)
#[1] "Titanic"         "Titanic"         "Titanic"         "Titanic"         "Donnie Darko"    "Donnie Darko"    "Donnie Darko"    "Twin Peaks"     
#[9] "American Hustle" "American Hustle"

data 数据

movies <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks', 
          'American Hustle', '')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM