I have a vector of path steps and there's one particular path step that if it repeats I want to eliminate the repetitions.
For example,
my_vec = "A > A > X > B > X > X > X > C > C"
Now if 'X' repeats, then I want to eliminate all repetitions of X besides the first one, while preserving the order of the rest of the elements, such that my desired outcome is:
my_vec = "A > A > X > B > X > C > C"
, where the repetitive X's are eliminated from the middle.
I tried this with a for-loop and if-else combination, such that I would detect if a previous element in the vector also contains 'X', then replace the element with NA and afterwards I could remove the NA items, but this approach does not provide the desired outcome.
I tried looking here and here , but these just filter out the unique elements, while I want to perform this action on a particular element.
Here's my code:
my_vec <- unlist(str_split(my_vec, '>') )
for (i in length(my_vec)){
if (grepl('X', my_vec[i]) & grepl('X', my_vec[i-1])) {
steps[i] <- NA
} else {
next()
}}
my_new_vec <- str_c(steps, collapse = '>')
However, the output is exactly the same as input and nothing is changed into NA.
1) gsub Replace any repeated sequence of X possibly followed by spaces and greater than characters with the last match in that sequence. This also works if the sequence is at the end. If we knew that the sequence was not at the end, such as in the example in the question, then we could simplify the first argument to "(X > )*"
gsub("(X[> ]*)*", "\\1", my_vec)
## [1] "A > A > X > B > X > C > C"
2) strsplit/rle If you prefer to use strsplit
as in the code in the question try it in conjunction with rle
. First we perform the strsplit
producing as
and then apply rle
to get r
. Now for each run of " X "
change its length to 1 and invert the runs back giving the deduped version of ss
as s
. Finally convert to a string and remove leading and trailing whitespace.
ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
r <- rle(ss)
r$lengths[r$values == " X "] <- 1
s <- inverse.rle(r)
trimws(paste(s, collapse = ">"))
## "A > A > X > B > X > C > C"
(2a) Another approach also using strsplit
is the following. The first and last lines of code here are the same as the first and last lines of code in (2).
ss <- strsplit(paste0(" ", my_vec, " "), ">")[[1]]
s <- ss[!c(FALSE, ss[-1] == ss[-length(ss)] & ss[-1] == " X ")]
trimws(paste(s, collapse = ">"))
## "A > A > X > B > X > C > C"
UPDATE: Handle case where sequence is at the end and add (2) and (2a).
We can use gsub
gsub("(?:X > )\\K(X > )\\1*", "", my_vec, perl = TRUE)
#[1] "A > A > X > B > X > C > C"
A solution without regular expression. my_vec4
is the final output.
# Create example string
my_vec <- "A > A > X > B > X > X > X > C > C"
library(dplyr)
# Split my_vec by " > "
my_vec2 <- strsplit(my_vec, split = " > ")[[1]]
# Same as the previous one and equal to X
X_logi <- my_vec2 == dplyr::lag(my_vec2) & my_vec2 %in% "X"
# Subset my_vec2 if X_logi is false
my_vec3 <- my_vec2[!X_logi]
# Concatenate my_vec3
my_vec4 <- paste(my_vec3, collapse = " > ")
let str = "A > A > X > B > X > X > X > C > C";
let result = str.replace(/(\s*X >)+/g, " X >");
console.log(result); // A > A > X > B > X > C > C
Translated to R this would be: gsub("(\\s*X >)+", " X >", my_vec) – G. Grothendieck
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.