简体   繁体   中英

Extracting parts of character string with gsub

I am quite new to R and am working now with a script that was done by me and my supervisor. Unfortunately I am unable to reuse one instance of gsub() for names of my samples. The previous version looked like this (Anterior and Posterior varied throughout the df):

"1: Anterior LN_60_026.fcs"   

and was taken apart using

cell.counts$EH_ID <- gsub("\\d+: (Anterior|Posterior) LN_(\\d{2})_\\d{3}.fcs", "LM02\\2", cell.counts$Sample)
cell.counts$Position <- gsub("\\d+: (Anterior|Posterior) LN_(\\d{2})_\\d{3}.fcs", "\\1", cell.counts$Sample)

Now I am faced with a similar problem which requires some minor adjustment. Because I don't know how gsub() syntax works I am stuck with:

"1: mLN_681_030.fcs"     

for which mLN and spleen vary throughout the df and the code that I tried to adapt doesn't work anymore:

cells$Mouse_ID <- gsub("\\d+: (mLN|spleen)(_\\d{2})_\\d{3}_\\.fcs", "AA_0\\2", cells$Sample)
cells$tissue <- gsub("\\d+: (mLN|spleen)_(\\d{3})_\\d{3}.fcs", "\\1", cells$Sample)

I should add that the "tissue" separation works, it's sample number extraction that doesn't. If anyone could explain to me what I am doing wrong and what the characters in this code do specifically, I'd be very grateful. PS: Yes I have used?gsub but I find the help files in R quite beginner unfriendly and didn't understand much.

You are expecting exactly 2 digits in the second capture group in your mouse ID line and you have a trailing underscore before your filename.

Also in the second regex you have not escaped the . which still works because an un-escaped . will match any character but should be \\. as below.

# > str <- "1: mLN_681_030.fcs"
# > gsub(str, pattern="\\d+: (mLN|spleen)(_\\d{3})_\\d{3}\\.fcs", replacement = "AA_0\\2")
# [1] "AA_0_681"
# > gsub(str, pattern = "\\d+: (mLN|spleen)_(\\d{3})_\\d{3}\\.fcs", replacement = "\\1")
# [1] "mLN"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM