简体   繁体   中英

How to concatenate select character elements in one row into one new element?

This is my first time posting here + a very new to coding in general, apologies.

Background information: I am trying to position the peptides and the HLA sequences, so that the corresponding HLA Allele protein sequence will be added on my 'Proteins' file. I know the combination of peptide and HLA allele that can occur, but I have to use a positional matrix to find out the HLA alleles exact protein sequence.

I have a .csv file called 'proteins', which looks like below.

Peptide,HLA,Binding
KLEDLERDL,HLA-A*02:01,Positive-Low
EVMPVSMAK,HLA-A*03:01,Positive-Intermediate
EVMPVSMAK,HLA-A*11:01,Positive-High
KTFPPTEPK,HLA-A*03:01,Positive-Intermediate
KTFPPTEPK,HLA-A*11:01,Positive-Intermediate
ATFSVPMEK,HLA-A*03:01,Positive-Intermediate
ATFSVPMEK,HLA-A*11:01,Positive-High

and I have a positional matrix .tsv file, which looks like below, I'm only showing the first allele of the file, which is A*01:01 .

allele  P-25    P-24    P-23    P-22    P-21    P-20    P-19    P-18    P-17    P-16    P-15    P-14    P-13    P-12    P-11    P-10    P-9     P-8     P-7     P-6
     P-5     P-4     P-3     P-2     P-1     P0      P1      P2      P3      P4      P5      P6      P7      P8      P9      P10     P11     P12     P13     P14     P15     P16     P17     P18     P19     P20     P21     P22     P23     P24     P25     P26     P27     P28     P29     P30     P31     P32     P33     P34     P35
     P36     P37     P38     P39     P40     P41     P42     P43     P44     P45     P46     P47     P48     P49     P50     P51     P52     P53     P54     P55     P56     P57     P58     P59     P60     P61     P62     P63     P64     P65     P66     P67     P68     P69     P70     P71     P72     P73     P74     P75     P76
     P77     P78     P79     P80     P81     P82     P83     P84     P85     P86     P87     P88     P89     P90     P91     P92     P93     P94     P95     P96     P97     P98     P99     P100    P101    P102    P103    P104    P105    P106    P107    P108    P109    P110    P111    P112    P113    P114    P115    P116    P117
    P118    P119    P120    P121    P122    P123    P124    P125    P126    P127    P128    P129    P130    P131    P132    P133    P134    P135    P136    P137    P138    P139    P140    P141    P142    P143    P144    P145    P146    P147    P148    P149    P150    P151    P152    P153    P154    P155    P156    P157    P158
    P159    P160    P161    P162    P163    P164    P165    P166    P167    P168    P169    P170    P171    P172    P173    P174    P175    P176    P177    P178    P179    P180    P181    P182    P183    P184    P185    P186    P187    P188    P189    P190    P191    P192    P193    P194    P195    P196    P197    P198    P199
    P200    P201    P202    P203    P204    P205    P206    P207    P208    P209    P210    P211    P212    P213    P214    P215    P216    P217    P218    P219    P220    P221    P222    P223    P224    P225    P226    P227    P228    P229    P230    P231    P232    P233    P234    P235    P236    P237    P238    P239    P240
    P241    P242    P243    P244    P245    P246    P247    P248    P249    P250    P251    P252    P253    P254    P255    P256    P257    P258    P259    P260    P261    P262    P263    P264    P265    P266    P267    P268    P269    P270    P271    P272    P273    P274    P275    P276    P277    P278    P279    P280    P281
    P282    P283    P284    P285    P286    P287    P288    P289    P290    P291    P292    P293    P294    P295    P296    P297    P298    P299    P300    P301    P302    P303    P304    P305    P306    P307    P308    P309    P310    P311    P312    P313    P314    P315    P316    P317    P318    P319    P320    P321    P322
    P323    P324    P325    P326    P327    P328    P329    P330    P331    P332    P333    P334    P335    P336    P337    P338    P339    P340    P341    P342    P343    P344    P345    P346    P347    P348    P349    P350    P351    P352    P353    P354    P355    P356    P357    P358    P359    P360    P361
A*01:01 M       A       V       M       A       P       R       T       L       L       L       L       L       S       G       A       L       A       L       .
       .       T       Q       T       W       A       G       S       H       S       M       R       Y       F       F       T       S       V       S       R
       P       G       R       G       E       P       R       F       I       A       V       G       Y       V       D       D       T       Q       F       V
       R       F       D       S       D       A       A       S       Q       K       M       E       P       R       A       P       W       I       E       Q
       E       G       P       E       Y       W       D       Q       E       T       R       N       M       K       A       H       S       Q       T       D
       R       A       N       L       G       T       L       R       G       Y       Y       N       Q       S       E       D       G       S       H       T
       I       Q       I       M       Y       G       C       D       V       G       P       D       G       R       F       L       R       G       Y       .
       R       Q       D       A       Y       D       G       K       D       Y       .       I       A       L       N       E       D       L       R       S
       W       T       A       A       D       M       A       A       Q       I       T       K       R       K       W       E       A       V       H       A
       A       E       .       .       .       .       .       .       .       .       .       .       .       .       .       .       Q       R       R       V
       Y       L       E       G       R       C       V       D       G       L       R       R       Y       L       E       N       .       .       .       G
       K       E       T       L       Q       R       T       D       P       P       K       T       H       M       T       H       H       P       I       S
       D       H       E       A       T       L       R       C       W       A       L       G       F       Y       P       A       E       I       T       L
       T       W       Q       R       D       G       E       D       .       Q       T       Q       D       T       E       L       V       E       T       R
       P       A       G       D       G       T       F       Q       K       W       A       A       V       V       V       P       S       G       E       E
       Q       R       Y       T       C       H       V       Q       H       E       G       L       P       K       P       L       T       L       R       W
       E       L       S       S       Q       P       T       I       P       I       V       G       I       I       A       G       L       V       L       L
       G       A       V       I       T       G       A       V       V       A       A       V       M       W       R       R       K       S       S       D
       R       K       G       G       S       Y       T       Q       A       A       S       S       D       S       A       Q       G       S       D       V
       S       L       T       A       C       K       V

My attempt so far:

Concatenate all the character values per row from the positional matrix file (the ones starting with P) into one string like so

ABCDEGGH***IHJLMNOP

and then match it according to the HLA position in 'proteins' file.

Create a new column in 'proteins' file, called 'HLA amino acid sequence', where the concatenated string value is added.

Problem:

I figured out how to concatenate string values together, but not according to what I need. Code below:

positional_matrix <- A_AA_mat_pos


concatenated_amino_acid <- c(positional_matrix, sep = "")
do.call(paste, positional_matrix)

head(concatenated_amino_acid) 

Looking at the head of this concatenated list,it pastes all the values column wise into one list, whereas I want each row to be concatenated instead.

Sounds like you can use a for loop to cycle through each row:

concat_rows <- rep(0, nrow(positional_matrix))  # init list

for(i in seq(1:nrow(positional_matrix)) {
        concat_rows[i] <- paste(positional_matrix[i, 1],
                                positional_matrix[i, 2], 
                                ...,
                                sep = '')
}

I will say that when I was looking at this, I had to list each column out in the paste command. When I tried to use something more scalable, like [1:ncol(..)] or something similar, it converted everything into numbers. Maybe someone else can shed some light onto that..

Welcome to Stack overflow. I personally would use a "tidy" approach.

library(tidyverse) # or library(tidyr)

positional_data <- positional_data %>%
  unite("concatenated", `P-25`:`P333`)

That makes a new column called concatenated.

For what its worth, I searched google for tidy r concatenate across columns , which had this link , which compares methods.

For a fuller solution, please look at suggestions for asking a reproducible example, particularly how to share data or make a minimal example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM