[英]Replace partial values from a column in a R dataframe
I have a dataframe, example as follows. 我有一个数据框,示例如下。
chr start2 end2 value
88 chrom16 56063633 56063634 0.238
78 chrom12 83039622 83039623 0.429
50 chrom12 73209081 73209082 0.313
68 chrom12 75138610 75138611 0.679
45 chrom12 67566601 67566602 0.859
120 chrom16 57694245 57694246 0.438
I would like to change partial values from a column. 我想从列中更改部分值。 In this dataframe, I would like to change "chrom" to "chr" in column 2.
在此数据框中,我想在第2列中将“ chrom”更改为“ chr”。
OUTPUT should look like 输出应该看起来像
chr start2 end2 value
88 chr16 56063633 56063634 0.238
78 chr12 83039622 83039623 0.429
50 chr12 73209081 73209082 0.313
68 chr12 75138610 75138611 0.679
45 chr12 67566601 67566602 0.859
120 chr16 57694245 57694246 0.438
If it the pattern is as simple as in your example you can just replace the "om" in "chrom" with an empty string. 如果它的样式与示例中的一样简单,则只需用空字符串替换“ chrom”中的“ om”即可。
df <- read.table(text = " chr start2 end2 value
88 chrom16 56063633 56063634 0.238
78 chrom12 83039622 83039623 0.429
50 chrom12 73209081 73209082 0.313
68 chrom12 75138610 75138611 0.679
45 chrom12 67566601 67566602 0.859
120 chrom16 57694245 57694246 0.438", header = TRUE)
df$chr <- sub("om", "", df$chr)
df
# chr start2 end2 value
# 88 chr16 56063633 56063634 0.238
# 78 chr12 83039622 83039623 0.429
# 50 chr12 73209081 73209082 0.313
# 68 chr12 75138610 75138611 0.679
# 45 chr12 67566601 67566602 0.859
# 120 chr16 57694245 57694246 0.438
We can use sub
to capture the first 3 characters as a group followed by matching 2 characters and then capture the numbers, replace it with the backreferences ( \\\\1
and \\\\2
for the captured groups). 我们可以使用
sub
将前三个字符捕获为一个组,然后匹配2个字符,然后捕获数字,将其替换为后向引用(捕获组的\\\\1
和\\\\2
)。
df1$chr <- sub("(.{3}).{2}(\\d+)", "\\1\\2", df1$chr)
df1$chr
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"
Or we can also use lookarounds 或者我们也可以使用环视
sub(".{2}(?=\\d)", "", df1$chr, perl = TRUE)
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"
This should also work if the string elements are changed. 如果更改了字符串元素,这也应该起作用。
Or another faster option is substr
based on the position of character. 或者换个更快的选项
substr
基于字符的位置。
df1$chr <- with(df1, paste0(substr(chr, 1, 3), substr(chr, 6,7)))
df1$chr
#[1] "chr16" "chr12" "chr12" "chr12" "chr12" "chr16"
Two possible ways: 两种可能的方式:
data <- read.table(text = 'chr start2 end2 value
chrom16 56063633 56063634 0.238
chrom12 83039622 83039623 0.429
chrom12 73209081 73209082 0.313
chrom12 75138610 75138611 0.679
chrom12 67566601 67566602 0.859
chrom16 57694245 57694246 0.438',
stringsAsFactors = FALSE,
header = TRUE)
# stringr package + base R for assignment
library(stringr)
data['chr'] <- str_replace(data[['chr']], "chrom", "chr")
data
# chr start2 end2 value
# 1 chr16 56063633 56063634 0.238
# 2 chr12 83039622 83039623 0.429
# 3 chr12 73209081 73209082 0.313
# 4 chr12 75138610 75138611 0.679
# 5 chr12 67566601 67566602 0.859
# 6 chr16 57694245 57694246 0.438
# with stringr and dplyr packages
library(dplyr)
data <-
data %>%
mutate(chr = str_replace(chr, "chrom", "chr"))
data
# chr start2 end2 value
# 1 chr16 56063633 56063634 0.238
# 2 chr12 83039622 83039623 0.429
# 3 chr12 73209081 73209082 0.313
# 4 chr12 75138610 75138611 0.679
# 5 chr12 67566601 67566602 0.859
# 6 chr16 57694245 57694246 0.438
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.