I have a database that contains 16 columns. In the 16 column I have the following text:
ASN_MAF=0.09;DOMAINS=Pfam_domain:PF00168,Prints_domain:.
I want to extract PF00168, so the substring between Pfam_domain: and ,. All rows have this pattern: Pfam_domain: and ,.
I try to do this request but it doesn't work:
res = sqldf("
SELECT SUBSTRING(v16, CHARINDEX("Pfam_domain:",v16)+1, 10000), CHARINDEX(",",v16)-1 )
FROM GeminiTable_germ
")
Try:
SELECT SUBSTRING(v16, CHARINDEX('Pfam_domain:', v16)+12, CHARINDEX
(',',v16) - (CHARINDEX('Pfam_domain:', v16)+12))
Note that I changed the " to ' inside the SQL statement - that is an important distinction in SQL.
Assuming we want the string between colon and comma, instr(v16, ':')+1
is the character position after the colon. Also, the string we want is of length instr(v16, ',') - instr(v16, ':')-1
so use substr
with those 2nd and 3rd arguments:
library(sqldf)
GeminiTable_germ <- data.frame(v16 =
"ASN_MAF=0.09;DOMAINS=Pfam_domain:PF00168,Prints_domain:.")
sqldf("select substr(v16, instr(v16, ':')+1, instr(v16, ',')-instr(v16, ':')-1) v16new
from GeminiTable_germ")
giving:
v16new
1 PF00168
We could alternately break it up and write it like this instead:
field <- function(x, from, to) {
from_pos <- sprintf("instr(%s, '%s')+1", x, from)
to_pos <- sprintf("instr(%s, '%s')-%s-2", x, to, from_pos)
sprintf("substr(%s, %s, %s)", x, from_pos, to_pos)
}
field('v16', ':', ',') # view generated code
fn$sqldf("select `field('v16', ':', ',')` v16new from GeminiTable_germ")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.