This is my first time posting here + a very new to coding in general, apologies.
Background information: I am trying to position the peptides and the HLA sequences, so that the corresponding HLA Allele protein sequence will be added on my 'Proteins' file. I know the combination of peptide and HLA allele that can occur, but I have to use a positional matrix to find out the HLA alleles exact protein sequence.
I have a .csv file called 'proteins', which looks like below.
Peptide,HLA,Binding
KLEDLERDL,HLA-A*02:01,Positive-Low
EVMPVSMAK,HLA-A*03:01,Positive-Intermediate
EVMPVSMAK,HLA-A*11:01,Positive-High
KTFPPTEPK,HLA-A*03:01,Positive-Intermediate
KTFPPTEPK,HLA-A*11:01,Positive-Intermediate
ATFSVPMEK,HLA-A*03:01,Positive-Intermediate
ATFSVPMEK,HLA-A*11:01,Positive-High
and I have a positional matrix .tsv file, which looks like below, I'm only showing the first allele of the file, which is A*01:01 .
allele P-25 P-24 P-23 P-22 P-21 P-20 P-19 P-18 P-17 P-16 P-15 P-14 P-13 P-12 P-11 P-10 P-9 P-8 P-7 P-6
P-5 P-4 P-3 P-2 P-1 P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35
P36 P37 P38 P39 P40 P41 P42 P43 P44 P45 P46 P47 P48 P49 P50 P51 P52 P53 P54 P55 P56 P57 P58 P59 P60 P61 P62 P63 P64 P65 P66 P67 P68 P69 P70 P71 P72 P73 P74 P75 P76
P77 P78 P79 P80 P81 P82 P83 P84 P85 P86 P87 P88 P89 P90 P91 P92 P93 P94 P95 P96 P97 P98 P99 P100 P101 P102 P103 P104 P105 P106 P107 P108 P109 P110 P111 P112 P113 P114 P115 P116 P117
P118 P119 P120 P121 P122 P123 P124 P125 P126 P127 P128 P129 P130 P131 P132 P133 P134 P135 P136 P137 P138 P139 P140 P141 P142 P143 P144 P145 P146 P147 P148 P149 P150 P151 P152 P153 P154 P155 P156 P157 P158
P159 P160 P161 P162 P163 P164 P165 P166 P167 P168 P169 P170 P171 P172 P173 P174 P175 P176 P177 P178 P179 P180 P181 P182 P183 P184 P185 P186 P187 P188 P189 P190 P191 P192 P193 P194 P195 P196 P197 P198 P199
P200 P201 P202 P203 P204 P205 P206 P207 P208 P209 P210 P211 P212 P213 P214 P215 P216 P217 P218 P219 P220 P221 P222 P223 P224 P225 P226 P227 P228 P229 P230 P231 P232 P233 P234 P235 P236 P237 P238 P239 P240
P241 P242 P243 P244 P245 P246 P247 P248 P249 P250 P251 P252 P253 P254 P255 P256 P257 P258 P259 P260 P261 P262 P263 P264 P265 P266 P267 P268 P269 P270 P271 P272 P273 P274 P275 P276 P277 P278 P279 P280 P281
P282 P283 P284 P285 P286 P287 P288 P289 P290 P291 P292 P293 P294 P295 P296 P297 P298 P299 P300 P301 P302 P303 P304 P305 P306 P307 P308 P309 P310 P311 P312 P313 P314 P315 P316 P317 P318 P319 P320 P321 P322
P323 P324 P325 P326 P327 P328 P329 P330 P331 P332 P333 P334 P335 P336 P337 P338 P339 P340 P341 P342 P343 P344 P345 P346 P347 P348 P349 P350 P351 P352 P353 P354 P355 P356 P357 P358 P359 P360 P361
A*01:01 M A V M A P R T L L L L L S G A L A L .
. T Q T W A G S H S M R Y F F T S V S R
P G R G E P R F I A V G Y V D D T Q F V
R F D S D A A S Q K M E P R A P W I E Q
E G P E Y W D Q E T R N M K A H S Q T D
R A N L G T L R G Y Y N Q S E D G S H T
I Q I M Y G C D V G P D G R F L R G Y .
R Q D A Y D G K D Y . I A L N E D L R S
W T A A D M A A Q I T K R K W E A V H A
A E . . . . . . . . . . . . . . Q R R V
Y L E G R C V D G L R R Y L E N . . . G
K E T L Q R T D P P K T H M T H H P I S
D H E A T L R C W A L G F Y P A E I T L
T W Q R D G E D . Q T Q D T E L V E T R
P A G D G T F Q K W A A V V V P S G E E
Q R Y T C H V Q H E G L P K P L T L R W
E L S S Q P T I P I V G I I A G L V L L
G A V I T G A V V A A V M W R R K S S D
R K G G S Y T Q A A S S D S A Q G S D V
S L T A C K V
My attempt so far:
Concatenate all the character values per row from the positional matrix file (the ones starting with P) into one string like so
ABCDEGGH***IHJLMNOP
and then match it according to the HLA position in 'proteins' file.
Create a new column in 'proteins' file, called 'HLA amino acid sequence', where the concatenated string value is added.
Problem:
I figured out how to concatenate string values together, but not according to what I need. Code below:
positional_matrix <- A_AA_mat_pos
concatenated_amino_acid <- c(positional_matrix, sep = "")
do.call(paste, positional_matrix)
head(concatenated_amino_acid)
Looking at the head of this concatenated list,it pastes all the values column wise into one list, whereas I want each row to be concatenated instead.
Sounds like you can use a for
loop to cycle through each row:
concat_rows <- rep(0, nrow(positional_matrix)) # init list
for(i in seq(1:nrow(positional_matrix)) {
concat_rows[i] <- paste(positional_matrix[i, 1],
positional_matrix[i, 2],
...,
sep = '')
}
I will say that when I was looking at this, I had to list each column out in the paste
command. When I tried to use something more scalable, like [1:ncol(..)]
or something similar, it converted everything into numbers. Maybe someone else can shed some light onto that..
Welcome to Stack overflow. I personally would use a "tidy" approach.
library(tidyverse) # or library(tidyr)
positional_data <- positional_data %>%
unite("concatenated", `P-25`:`P333`)
That makes a new column called concatenated.
For what its worth, I searched google for tidy r concatenate across columns , which had this link , which compares methods.
For a fuller solution, please look at suggestions for asking a reproducible example, particularly how to share data or make a minimal example.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.