looping through a column in R and extracting characters

Question

I have a data frame and one column has the protein id's along with a bunch of nonsensical stuff, like the image below. The id that I want is always the 4th through 9th character so I want to loop through the column and extract these to export them to another csv file. The column is also full of NA's which I don't want. I'm struggeling to come up with a loop in R that will slice out the exact characters I want everytime and do nothing if there are NA's and then to stop when it finds a blank, since this would be the end of the list.

mock example of column

Prot Id's
sp|IDIDID|PSKSJ_45HELI^sp|IDIDID|FRUEHFJ^HSLHFHG#%$^9y7hiuahl
sp|IDIDID|PSKSJ_45HELI^spuegfuehfw3|IDIDID|FRUEHFJ^HDGFLFHEHFN
NA
NA
sp|IDIDID|PSKSJ_45HELIWUEU^#H63hHU6e^sp|IDIDID|FRUEHFJ^HFGHG:WHFUWH^hfue
NA
sp|IDIDID|PSKSJ_45HELI^spJFBEFBUEBFE|IDIDID|FRUEHFJ^
NA
NA

The part that says IDIDID is what I want to get, any help would be greatly appreciated

Answer 1

Use the substr function to extract the range that you want:

x = c("sp|456879|sequence1","sp|121212|sequence2",NA)
d = data.frame(Prot_Id = x)
substr(d[!is.na(d$Prot_Id),],4,9)

Output:

[1] "456879" "121212"

looping through a column in R and extracting characters

Question

1 answers

solution1
3 ACCPTED 2015-09-07 02:50:07

looping through a column in R and extracting characters

Question

1 answers

solution1 3 ACCPTED 2015-09-07 02:50:07

solution1
3 ACCPTED 2015-09-07 02:50:07