I am fairly new to R so please go easy on me if this is a stupid question.
I have a dataframe called foo
:
< head(foo)
Old.Clone.Name New.Clone.Name File
1 A Aa A_mask_MF_final_IS2_SAEE7-1_02.nrrd
2 B Bb B_mask_MF_final_IS2ViaIS2h_SADQ15-1_02.nrrd
3 C Cc C_mask_MF_final_IS2ViaIS2h_SAEC16-1_02.nrrd
4 D Dd D_mask_MF_final_IS2ViaIS2h_SAEJ6-1_02.nrrd
5 E Ee F_mask_MF_final_IS2_SAED9-1_02.nrrd
6 F Ff F_mask_MF_final_IS2ViaIS2h_SAGP3-1_02.nrrd
I want to extract codes from the File
column that match the regular expression (S[AZ]{3}[0-9]{1,2}-[0-9]_02)
, to give me:
SAEE7-1_02
SADQ15-1_02
SAEC16-1_02
SAEJ6-1_02
SAED9-1_02
SAGP3-1_02
I then want to use these codes to search another directory for other files that contain the same code.
I fail, however, at the first hurdle and cannot extract the codes from that column of the data frame.
I have tried:
library('stringr')
str_extract(foo[3],regex("(S[A-Z]{3}[0-9]{1,2}-[0-9]_02)", ignore_case = TRUE))
but this just returns [1] NA
.
Am I simply missing something obvious? I look forward to cracking this with a bit of help from the community.
Hello if you are reading the data as a table file then foo[3]
is a list and str_extract does not accept lists, only strings, then you should use lapply to extract the match of every element.
lapply(foo[3], function(x) str_extract(x, "[sS][a-zA-Z]{3}[0-9]{1,2}-[0-9]_02"))
Result:
[1] "SAEE7-1_02" "SADQ15-1_02" "SAEC16-1_02" "SAEJ6-1_02" "SAED9-1_02"
[6] "SAGP3-1_02"
str_extract(foo[3],"(?i)S[A-Z]{3}[0-9]{1,2}-[0-9]_02")
seems to work. Somehow, my R gave me
"Error in check_pattern(pattern, string) : could not find function "regex""
when using your original expression.
The following code will repeat what you asked (just copy and paste to your R console ):
library(stringr)
foo = scan(what='')
Old.Clone.Name New.Clone.Name File
A Aa A_mask_MF_final_IS2_SAEE7-1_02.nrrd
B Bb B_mask_MF_final_IS2ViaIS2h_SADQ15-1_02.nrrd
C Cc C_mask_MF_final_IS2ViaIS2h_SAEC16-1_02.nrrd
D Dd D_mask_MF_final_IS2ViaIS2h_SAEJ6-1_02.nrrd
E Ee F_mask_MF_final_IS2_SAED9-1_02.nrrd
F Ff F_mask_MF_final_IS2ViaIS2h_SAGP3-1_02.nrrd
foo = matrix(foo,ncol=3,byrow=T)
colnames(foo)=foo[1,]
foo = foo[-1,]
foo
str_extract(foo[,3],regex("(S[A-Z]{3}[0-9]{1,2}-[0-9]_02)", ignore_case = T))
The reason you get NULL is hidden: R stores entries by column, hence foo[3]
is the 3rd row and 1st column of foo matrix/data frame. To quote the third column, you may need to use foo[,3]
. or foo<-data.frame(foo); foo[[3]]
foo<-data.frame(foo); foo[[3]]
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.