简体   繁体   中英

How to extract file paths in the folder with partial match

I have a dataframe called mydf . I want to match the character in column FAM to the file names matching character before _ in the directory called /mypath/ files, and replace the values in FAM column with full path of the respective files.

files in /mypath:

1_reca.44.bam
12_reca.xx.4.bam
AMA_xtt.33.bam 
SMA_xtt.33.bam 

mydf

sn      FAM            PAT
1       1              all 
2       12             non
3       AMA            non

result

sn      FAM                   PAT
1       mypath/1_reca.44.bam      all 
2       mypath/12_reca.xx.4.bam   non
3       mypath/AMA_xtt.33.bam     non

We can try with match/paste/sub . We match the elements in the 'FAM' column with the substring of 'files' (after the removing the characters from _ to the end of the string. The numeric index can be used to select the elements from 'files', and paste it with 'mypath' to create the updated 'FAM' column.

mydf$FAM<- paste('mypath',  
           files[match(mydf$FAM,sub('_.*', '', files))], sep='/')
mydf$FAM
#[1] "mypath/1_reca.44.bam"    "mypath/12_reca.xx.4.bam" 
# "mypath/AMA_xtt.33.bam"  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM