简体   繁体   English

从R数据框中的列替换部分值

[英]Replace partial values from a column in a R dataframe

I have a dataframe, example as follows. 我有一个数据框,示例如下。

        chr   start2     end2 value
88  chrom16 56063633 56063634 0.238
78  chrom12 83039622 83039623 0.429
50  chrom12 73209081 73209082 0.313
68  chrom12 75138610 75138611 0.679
45  chrom12 67566601 67566602 0.859
120 chrom16 57694245 57694246 0.438

I would like to change partial values from a column. 我想从列中更改部分值。 In this dataframe, I would like to change "chrom" to "chr" in column 2. 在此数据框中,我想在第2列中将“ chrom”更改为“ chr”。

OUTPUT should look like 输出应该看起来像

     chr   start2     end2 value
88  chr16 56063633 56063634 0.238
78  chr12 83039622 83039623 0.429
50  chr12 73209081 73209082 0.313
68  chr12 75138610 75138611 0.679
45  chr12 67566601 67566602 0.859
120 chr16 57694245 57694246 0.438

If it the pattern is as simple as in your example you can just replace the "om" in "chrom" with an empty string. 如果它的样式与示例中的一样简单,则只需用空字符串替换“ chrom”中的“ om”即可。

df <- read.table(text = " chr   start2     end2 value
                              88  chrom16 56063633 56063634 0.238
                              78  chrom12 83039622 83039623 0.429
                              50  chrom12 73209081 73209082 0.313
                              68  chrom12 75138610 75138611 0.679
                              45  chrom12 67566601 67566602 0.859
                              120 chrom16 57694245 57694246 0.438", header = TRUE)

df$chr <- sub("om", "", df$chr)

df
#          chr   start2     end2 value
#    88  chr16 56063633 56063634 0.238
#    78  chr12 83039622 83039623 0.429
#    50  chr12 73209081 73209082 0.313
#    68  chr12 75138610 75138611 0.679
#    45  chr12 67566601 67566602 0.859
#    120 chr16 57694245 57694246 0.438

We can use sub to capture the first 3 characters as a group followed by matching 2 characters and then capture the numbers, replace it with the backreferences ( \\\\1 and \\\\2 for the captured groups). 我们可以使用sub将前三个字符捕获为一个组,然后匹配2个字符,然后捕获数字,将其替换为后向引用(捕获组的\\\\1\\\\2 )。

df1$chr <- sub("(.{3}).{2}(\\d+)", "\\1\\2", df1$chr)
df1$chr
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"

Or we can also use lookarounds 或者我们也可以使用环视

sub(".{2}(?=\\d)", "", df1$chr, perl = TRUE)
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"

This should also work if the string elements are changed. 如果更改了字符串元素,这也应该起作用。

Or another faster option is substr based on the position of character. 或者换个更快的选项substr基于字符的位置。

df1$chr <- with(df1, paste0(substr(chr, 1, 3), substr(chr, 6,7)))
df1$chr
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"

Two possible ways: 两种可能的方式:

data <- read.table(text = 'chr     start2   end2     value
                           chrom16 56063633 56063634 0.238
                           chrom12 83039622 83039623 0.429
                           chrom12 73209081 73209082 0.313
                           chrom12 75138610 75138611 0.679
                           chrom12 67566601 67566602 0.859
                           chrom16 57694245 57694246 0.438', 
                   stringsAsFactors = FALSE, 
                   header = TRUE)

# stringr package + base R for assignment
library(stringr)
data['chr'] <- str_replace(data[['chr']], "chrom", "chr")

data
#   chr   start2     end2 value
# 1 chr16 56063633 56063634 0.238
# 2 chr12 83039622 83039623 0.429
# 3 chr12 73209081 73209082 0.313
# 4 chr12 75138610 75138611 0.679
# 5 chr12 67566601 67566602 0.859
# 6 chr16 57694245 57694246 0.438

# with stringr and dplyr packages
library(dplyr)
data <- 
  data %>% 
  mutate(chr = str_replace(chr, "chrom", "chr"))

data
#   chr   start2     end2 value
# 1 chr16 56063633 56063634 0.238
# 2 chr12 83039622 83039623 0.429
# 3 chr12 73209081 73209082 0.313
# 4 chr12 75138610 75138611 0.679
# 5 chr12 67566601 67566602 0.859
# 6 chr16 57694245 57694246 0.438

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:使用行/列替换另一个数据框的值 - R: Replace values of dataframe from another using row/column R-基于数据帧列内的部分匹配进行多次搜索和替换 - R - Multiple search and replace based on partial match within a column of a dataframe 检查并替换R数据框中的列值 - Check and replace column values in R dataframe 在R中的子集数据框中替换多个列值 - Replace multiple column values in a subset dataframe in R 与 R 中另一个 dataframe 中的列匹配时,替换 dataframe 中的列中的值 - Replace values in column of a dataframe when matching to column in another dataframe in R R-根据特定奇数列中的值替换特定偶数列中的值-适用于整个数据帧 - R - Replace values in a specific even column based on values from a odd specific column - Application to the whole dataframe 根据来自 R 中另一列值的部分字符串匹配查找数据帧的子集 - Find a subset of dataframe based on partial string matching from another column of values in R R将字符转换为数据框列名,然后替换列中的值 - R convert character to dataframe column name and then replace values in a column 如何根据R中另一列中的值替换数据框的列中的值? - How to replace values in the columns of a dataframe based on the values in the other column in R? 如何根据 R dataframe 中的列将 NA 值替换为不同的值? - How to replace NA values with different values based on column in R dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM