I would like to make a new column that contains the string following the last ;
symbol in the column ID
. I know how to do is using awk, but not in R.
> head(Mapped2)
IsomiR ID
1 TCCCGGGTGGTCTAGTGGTTAGGATTCGGCGCT URS0000635088;tRNA-Glu-CTC-2-1
2 TCCCGGGTGGTCTAGTGGTTAGGATTCGGCGCT URS000011CFE8;misc_RNA
3 TCCCGGGTGGTCTAGTGGTTAGGATTCGGCGCT URS00006A26A3;Homo;sapiens;tRNA
4 TTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGA URS00008D20CE;Homo;sapiens;large;subunit;rRNA
5 TTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGA URS00008C7E99;Homo;sapiens;large;subunit;rRNA
6 TTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGA URS000075EC78;Homo;sapiens;RNA,;28S;ribosomal;5;(RNA28S5),;rRNA.
How about a pattern that matches non- ;
characters between a ;
and the end of the string, like this:
s <- "6TTGCCCTCGGCCGATCGAAAGGGAGTCGGGTTCAGATCCCCGAATCCGGAURS000075EC78;Homo;sapiens;RNA,;28S;ribosomal;5;(RNA28S5),;rRNA."
gsub(".*;([^;]+)$", "\\1", s)
# [1] "rRNA."
Working example:
d <- structure(list(ID = structure(c(2L, 1L, 3L, 6L, 5L, 4L), .Label = c("URS000011CFE8;misc_RNA", "URS0000635088;tRNA-Glu-CTC-2-1", "URS00006A26A3;Homo;sapiens;tRNA", "URS000075EC78;Homo;sapiens;RNA,;28S;ribosomal;5;(RNA28S5),;rRNA.", "URS00008C7E99;Homo;sapiens;large;subunit;rRNA", "URS00008D20CE;Homo;sapiens;large;subunit;rRNA"), class = "factor")), .Names = "ID", class = "data.frame", row.names = c(NA, -6L))
d$newcol <- gsub(".*;([^;]+)$", "\\1", d$ID)
d
# ID newcol
# 1 URS0000635088;tRNA-Glu-CTC-2-1 tRNA-Glu-CTC-2-1
# 2 URS000011CFE8;misc_RNA misc_RNA
# 3 URS00006A26A3;Homo;sapiens;tRNA tRNA
# 4 URS00008D20CE;Homo;sapiens;large;subunit;rRNA rRNA
# 5 URS00008C7E99;Homo;sapiens;large;subunit;rRNA rRNA
# 6 URS000075EC78;Homo;sapiens;RNA,;28S;ribosomal;5;(RNA28S5),;rRNA. rRNA.
If you want to capture the last occurrence of ;
, you can use a greedy operator to capture everything before it (including) and remove it while leaving only what's left, eg
sub(".*;" , "", Mapped2$ID)
# [1] "tRNA-Glu-CTC-2-1" "misc_RNA" "tRNA" "rRNA" "rRNA" "rRNA."
Given grep uses regexs, here's a regex that works for me: /;([^\\;]*)\\n/g
See this regex demo for implementaiton.
I don't know R, unfortunately, but hopefully that can get you started using grep to that end.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.