简体   繁体   中英

Removing all characters in a variable after a specific character in r

I have a dataset df1 like so:

snp <- c("rs7513574_T", "rs1627238_A", "rs1171278_C")
p.value <- c(2.635489e-01, 9.836280e-01 , 6.315047e-01  )

df1 <- data.frame(snp, p.value)

I want to remove the _ underscore and the letters after it (representing allele) in df1 and make this into a new dataframe df2

I tried this using the code

df2 <- df1[,c("snp", "allele"):=tstrsplit(`snp`, "_", fixed = TRUE)]

However, this changes the df1 data frame. Is there another way to do this?

This is my best guess as to what you want:

library(tidyr)
separate(df1, snp, into = c("snp", "allele"), sep = "_")
#         snp allele   p.value
# 1 rs7513574      T 0.2635489
# 2 rs1627238      A 0.9836280
# 3 rs1171278      C 0.6315047
df2 = df1 %>% 
    dplyr::mutate(across(c(V1, V2, V3), ~stringr::str_remove_all(., "_[:alpha:]")))
> df2
               V1        V2        V3
snp     rs7513574 rs1627238 rs1171278
p.value 0.2635489  0.983628 0.6315047

Try:

df2 <- df1 %>% mutate(snp=gsub("_.","",snp))

Consider creating a copy of the dataset and do the tstrsplit on the copied data to avoid changes in original data

library(data.table)
df2 <- copy(df1)
setDT(df2)[,c("snp", "allele") := tstrsplit(snp, "_", fixed = TRUE)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM