[英]Fill in empty values with the last non-empty in vector
I would like to fill in the missing values (not NA
, just ''
!) in a vector with the value before it. 我想在一个带有前面值的向量中填入缺失值(不是
NA
,只是''
!!)。 For example, if I have a vector defined as 例如,如果我将矢量定义为
vec <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks',
'American Hustle', '')
my output vector would be 我的输出矢量将是
'Titanic', 'Titanic', 'Titanic', 'Titanic', 'Donnie Darko', 'Donnie Darko',
'Donnie Darko', 'Twin Peaks', 'American Hustle', 'American Hustle'
How can I achieve this? 我怎样才能做到这一点?
Here is a two-liner with nzchar
and subsetting that should be quite efficient. 这是一个带有
nzchar
和子集的双nzchar
,应该非常有效。
# get logical vector of elements with non-empty character elements
notMissings <- nzchar(movies)
# fill in missing values
movies[notMissings][cumsum(notMissings)]
[1] "Titanic" "Titanic" "Titanic" "Titanic"
[5] "Donnie Darko" "Donnie Darko" "Donnie Darko" "Twin Peaks"
[9] "American Hustle" "American Hustle"
Here is a second method using rle
. 这是使用
rle
的第二种方法。
# get run length encodings
temp <- rle(movies)
# get missing values
missings <- nchar(temp$values) == 0
# fill in missing values
temp$values[missings] <- temp$values[which(missings) - 1]
# expand
inverse.rle(temp)
[1] "Titanic" "Titanic" "Titanic" "Titanic"
[5] "Donnie Darko" "Donnie Darko" "Donnie Darko" "Twin Peaks"
[9] "American Hustle" "American Hustle"
Note that this second method will throw an error if the first element is the empty character, ''. 请注意,如果第一个元素是空字符'',则第二个方法将抛出错误。
data 数据
movies <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks',
'American Hustle', '')
Using Reduce
in base R where vec
is your vector: 在基数R中使用
Reduce
,其中vec
是你的向量:
Reduce(function(x,y) ifelse(y=="", x, y), vec, accumulate=TRUE)
#[1] "Titanic" "Titanic" "Titanic" "Titanic" #"Donnie Darko"
#[6] "Donnie Darko" "Donnie Darko" "Twin Peaks" "American Hustle" #"American Hustle"
Or we can use na.locf
from zoo
: 或者我们可以使用
zoo
na.locf
:
library(zoo)
vec <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks', 'American Hustle', '')
vec[which(vec == "")] <- NA
na.locf(vec)
# [1] "Titanic" "Titanic" "Titanic" "Titanic" "Donnie Darko" "Donnie Darko"
# [7] "Donnie Darko" "Twin Peaks" "American Hustle" "American Hustle"
We can also use 我们也可以使用
unlist(tapply(movies, cumsum(movies !=""), FUN =
function(x) rep(x[1], length(x))), use.names = FALSE)
#[1] "Titanic" "Titanic" "Titanic" "Titanic" "Donnie Darko" "Donnie Darko" "Donnie Darko" "Twin Peaks"
#[9] "American Hustle" "American Hustle"
movies <- c('Titanic', '', '', '', 'Donnie Darko', '', '', 'Twin Peaks',
'American Hustle', '')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.