简体   繁体   中英

Moving 10th character in a string to 4th from the end within dataframe in R

I have strings within a data frame (class chr), but for simplicity I'll just describe 1 string.

x <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N")

I want to re-order a bunch of these strings (in column seq) the same way, moving the 10th character ("J") to its new position 4th from the end (where "K" is now) so in this case it just swaps "J" and "K". I'm guessing it'd look something like

mutate(seq_reordered = str_replace("pattern", "replacement", seq) %>%

or maybe

mutate(seq_reordered = sub(seq, "pattern", "replacement") %>%

but the regex conditions confuse me and it's not obvious to me how this works

As it is a vector of length 14, we can rearrange by indexing

x1 <- c(x[1:9], x[11], x[10], x[12:length(x)])

Or just do indexing

x1 <- x[c(1:9, 11:10, 12:length(x))]

Define the permutation ix and then apply it:

ix <- replace(seq_along(x), c(10, 11), c(11, 10))
x[ix]
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "K" "J" "L" "M" "N"

The above is particularly convenient if you have a data frame and need to apply it to all or some of the rows since it can all be done at once.

DF <- DF[ix, ]

or to apply it to just the jy columns:

DF[jy] <- DF[ix, jy]

Although not as convenient for mass application another approach is to use replace directly on x :

replace(x, c(10, 11), x[c(11, 10)])

Classic swapping problem?

temp <- x[10]
x[10] <- x[length(x) - 3] 
x[length(x) - 3] <- temp

There's couple of ways of going about the problem.

The first, easier (programming-wise) option is that if you're able to split the string into multiple columns of a data frame, then you can use tidy tools from dplyr to turn the data frame into a long format & then swap the position indices:

library(tidyverse)

# Generate data
set.seed(123456)
sequence_tibble1 <- tibble(c1 = sample(letters, 10), c2 = sample(letters, 10),
                    c3 = sample(letters, 10), c4 = sample(letters, 10),
                    c5 = sample(letters, 10), c6 = sample(letters, 10), 
                    c7 = sample(letters, 10), c8 = sample(letters, 10))

# Turn data frame long & turn the position variable numeric
sequence_tibble1 <- sequence_tibble1 %>%
  gather(key = 'position', value = 'character') %>%
  mutate(position = str_remove(position, 'c') %>% as.numeric())

# Create updated position2 variable that has the new positions you want
sequence_tibble1 <- sequence_tibble1 %>%
  mutate(position2 = case_when(
    position == 2 ~ 8,
    position == 8 ~ 2,
    TRUE ~ position
  ))

The second option may be little more like what you're after but it relies a bit on more advanced functional programming with purrr , however it should be fairly obvious what's going on:


sequence <- list(c(sample(letters, 10)), c(sample(letters, 10)),
                 c(sample(letters, 10)), c(sample(letters, 10)))

sequence_tibble2 <- tibble(sequence)

swap_positions <- function(x) {

  x <- c(x[1:5], x[10], x[7:9], x[6])

}

sequence_tibble2 <- sequence_tibble2 %>%
  mutate(sequence2 = purrr::map(sequence, ~ swap_positions(.x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM