Extract string between prefix and suffix

Question

I have these columns:

                 text.NANA text.22 text.32
1    Female RNDM_MXN95.tif      No      NA
12     Male RNDM_QOS38.tif      No      NA
13  Female  RNDM_WQW90.tif      No      NA
14    Male  RNDM_BKD94.tif      No      NA
15    Male  RNDM_LGD67.tif      No      NA
16   Female RNDM_AFP45.tif      No      NA

I want to create a column that only has the barcode that starts with RNDM_ and ends with .tif , but not including .tif . The tricky part is to get rid of the gender information that is also in the same column. There are a random amount of spaces between the gender information and the RNDM_ :

                 text.NANA text.22 text.32    BARCODE
1    Female RNDM_MXN95.tif      No      NA RNDM_MXN95
12     Male RNDM_QOS38.tif      No      NA RNDM_QOS38
13  Female  RNDM_WQW90.tif      No      NA RNDM_WQW90
14    Male  RNDM_BKD94.tif      No      NA RNDM_BKD94
15    Male  RNDM_LGD67.tif      No      NA RNDM_LGD67
16   Female RNDM_AFP45.tif      No      NA RNDM_AFP45

I made a very poor attempt with this, but it didn't work:

dfrm$BARCODE <- regexpr("RNDM_", dfrm$text.NANA)
# [1] 8 6 9 7 7 8 9 9 8 8 9 9 6 6 7 8 9 8
# attr(,"match.length")
# [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# attr(,"useBytes")
# [1] TRUE

Please help. Thanks!

Answer 1

So you just want to remove the file extension? Use file_path_sans_ext :

dfrm$BARCODE = file_path_sans_ext(dfrm$text.NANA)

If there's more stuff in front, you can use the following regular expression to extract just the suffix:

dfrm$BARCODE = stringr::str_match(dfrm$text.NANA, '(RNDM_.*)\\.tif')[, 2]

Note that I'm using the {stringr} package here because the base R functions for extracting regex matches are terrible. Nobody uses them.

I strongly recommend against using strsplit here because it's underspecified: from reading the code it's absolutely not clear what the purpose of that code is. Write code that is self-explanatory, not code that requires explanation in a comment.

Answer 2

You can use sapply() and strsplit to do it easy, let me show you:

sapply(strsplit(dfrm$text.NANA, "_"),"[", 1)

That should work.

Edit:

sapply(strsplit(x, "[ .]+"),"[", 2)

Extract string between prefix and suffix

Question

2 answers

solution1
2 ACCPTED 2018-03-22 16:23:16

solution2
0 2018-03-22 16:17:55

Extract string between prefix and suffix

Question

2 answers

solution1 2 ACCPTED 2018-03-22 16:23:16

solution2 0 2018-03-22 16:17:55

solution1
2 ACCPTED 2018-03-22 16:23:16

solution2
0 2018-03-22 16:17:55