简体   繁体   English

如何将一行中的选择字符元素连接成一个新元素?

[英]How to concatenate select character elements in one row into one new element?

This is my first time posting here + a very new to coding in general, apologies.这是我第一次在这里发帖 + 一个非常新的编码,抱歉。

Background information: I am trying to position the peptides and the HLA sequences, so that the corresponding HLA Allele protein sequence will be added on my 'Proteins' file.背景信息:我正在尝试定位肽段和 HLA 序列,以便将相应的 HLA 等位基因蛋白质序列添加到我的“蛋白质”文件中。 I know the combination of peptide and HLA allele that can occur, but I have to use a positional matrix to find out the HLA alleles exact protein sequence.我知道可能发生肽和 HLA 等位基因的组合,但我必须使用位置矩阵来找出 HLA 等位基因的确切蛋白质序列。

I have a .csv file called 'proteins', which looks like below.我有一个名为“蛋白质”的 .csv 文件,如下所示。

Peptide,HLA,Binding
KLEDLERDL,HLA-A*02:01,Positive-Low
EVMPVSMAK,HLA-A*03:01,Positive-Intermediate
EVMPVSMAK,HLA-A*11:01,Positive-High
KTFPPTEPK,HLA-A*03:01,Positive-Intermediate
KTFPPTEPK,HLA-A*11:01,Positive-Intermediate
ATFSVPMEK,HLA-A*03:01,Positive-Intermediate
ATFSVPMEK,HLA-A*11:01,Positive-High

and I have a positional matrix .tsv file, which looks like below, I'm only showing the first allele of the file, which is A*01:01 .我有一个位置矩阵 .tsv 文件,如下所示,我只显示文件的第一个等位基因,即A*01:01

allele  P-25    P-24    P-23    P-22    P-21    P-20    P-19    P-18    P-17    P-16    P-15    P-14    P-13    P-12    P-11    P-10    P-9     P-8     P-7     P-6
     P-5     P-4     P-3     P-2     P-1     P0      P1      P2      P3      P4      P5      P6      P7      P8      P9      P10     P11     P12     P13     P14     P15     P16     P17     P18     P19     P20     P21     P22     P23     P24     P25     P26     P27     P28     P29     P30     P31     P32     P33     P34     P35
     P36     P37     P38     P39     P40     P41     P42     P43     P44     P45     P46     P47     P48     P49     P50     P51     P52     P53     P54     P55     P56     P57     P58     P59     P60     P61     P62     P63     P64     P65     P66     P67     P68     P69     P70     P71     P72     P73     P74     P75     P76
     P77     P78     P79     P80     P81     P82     P83     P84     P85     P86     P87     P88     P89     P90     P91     P92     P93     P94     P95     P96     P97     P98     P99     P100    P101    P102    P103    P104    P105    P106    P107    P108    P109    P110    P111    P112    P113    P114    P115    P116    P117
    P118    P119    P120    P121    P122    P123    P124    P125    P126    P127    P128    P129    P130    P131    P132    P133    P134    P135    P136    P137    P138    P139    P140    P141    P142    P143    P144    P145    P146    P147    P148    P149    P150    P151    P152    P153    P154    P155    P156    P157    P158
    P159    P160    P161    P162    P163    P164    P165    P166    P167    P168    P169    P170    P171    P172    P173    P174    P175    P176    P177    P178    P179    P180    P181    P182    P183    P184    P185    P186    P187    P188    P189    P190    P191    P192    P193    P194    P195    P196    P197    P198    P199
    P200    P201    P202    P203    P204    P205    P206    P207    P208    P209    P210    P211    P212    P213    P214    P215    P216    P217    P218    P219    P220    P221    P222    P223    P224    P225    P226    P227    P228    P229    P230    P231    P232    P233    P234    P235    P236    P237    P238    P239    P240
    P241    P242    P243    P244    P245    P246    P247    P248    P249    P250    P251    P252    P253    P254    P255    P256    P257    P258    P259    P260    P261    P262    P263    P264    P265    P266    P267    P268    P269    P270    P271    P272    P273    P274    P275    P276    P277    P278    P279    P280    P281
    P282    P283    P284    P285    P286    P287    P288    P289    P290    P291    P292    P293    P294    P295    P296    P297    P298    P299    P300    P301    P302    P303    P304    P305    P306    P307    P308    P309    P310    P311    P312    P313    P314    P315    P316    P317    P318    P319    P320    P321    P322
    P323    P324    P325    P326    P327    P328    P329    P330    P331    P332    P333    P334    P335    P336    P337    P338    P339    P340    P341    P342    P343    P344    P345    P346    P347    P348    P349    P350    P351    P352    P353    P354    P355    P356    P357    P358    P359    P360    P361
A*01:01 M       A       V       M       A       P       R       T       L       L       L       L       L       S       G       A       L       A       L       .
       .       T       Q       T       W       A       G       S       H       S       M       R       Y       F       F       T       S       V       S       R
       P       G       R       G       E       P       R       F       I       A       V       G       Y       V       D       D       T       Q       F       V
       R       F       D       S       D       A       A       S       Q       K       M       E       P       R       A       P       W       I       E       Q
       E       G       P       E       Y       W       D       Q       E       T       R       N       M       K       A       H       S       Q       T       D
       R       A       N       L       G       T       L       R       G       Y       Y       N       Q       S       E       D       G       S       H       T
       I       Q       I       M       Y       G       C       D       V       G       P       D       G       R       F       L       R       G       Y       .
       R       Q       D       A       Y       D       G       K       D       Y       .       I       A       L       N       E       D       L       R       S
       W       T       A       A       D       M       A       A       Q       I       T       K       R       K       W       E       A       V       H       A
       A       E       .       .       .       .       .       .       .       .       .       .       .       .       .       .       Q       R       R       V
       Y       L       E       G       R       C       V       D       G       L       R       R       Y       L       E       N       .       .       .       G
       K       E       T       L       Q       R       T       D       P       P       K       T       H       M       T       H       H       P       I       S
       D       H       E       A       T       L       R       C       W       A       L       G       F       Y       P       A       E       I       T       L
       T       W       Q       R       D       G       E       D       .       Q       T       Q       D       T       E       L       V       E       T       R
       P       A       G       D       G       T       F       Q       K       W       A       A       V       V       V       P       S       G       E       E
       Q       R       Y       T       C       H       V       Q       H       E       G       L       P       K       P       L       T       L       R       W
       E       L       S       S       Q       P       T       I       P       I       V       G       I       I       A       G       L       V       L       L
       G       A       V       I       T       G       A       V       V       A       A       V       M       W       R       R       K       S       S       D
       R       K       G       G       S       Y       T       Q       A       A       S       S       D       S       A       Q       G       S       D       V
       S       L       T       A       C       K       V

My attempt so far:到目前为止我的尝试:

Concatenate all the character values per row from the positional matrix file (the ones starting with P) into one string like so将位置矩阵文件中每行的所有字符值(以 P 开头的字符值)连接成一个字符串,如下所示

ABCDEGGH***IHJLMNOP

and then match it according to the HLA position in 'proteins' file.然后根据'proteins'文件中的HLA位置进行匹配。

Create a new column in 'proteins' file, called 'HLA amino acid sequence', where the concatenated string value is added.在“蛋白质”文件中创建一个名为“HLA 氨基酸序列”的新列,其中添加了连接的字符串值。

Problem:问题:

I figured out how to concatenate string values together, but not according to what I need.我想出了如何将字符串值连接在一起,但不是根据我的需要。 Code below:代码如下:

positional_matrix <- A_AA_mat_pos


concatenated_amino_acid <- c(positional_matrix, sep = "")
do.call(paste, positional_matrix)

head(concatenated_amino_acid) 

Looking at the head of this concatenated list,it pastes all the values column wise into one list, whereas I want each row to be concatenated instead.查看此连接列表的头部,它将所有值逐列粘贴到一个列表中,而我希望将每一行连接起来。

Sounds like you can use a for loop to cycle through each row:听起来您可以使用for循环遍历每一行:

concat_rows <- rep(0, nrow(positional_matrix))  # init list

for(i in seq(1:nrow(positional_matrix)) {
        concat_rows[i] <- paste(positional_matrix[i, 1],
                                positional_matrix[i, 2], 
                                ...,
                                sep = '')
}

I will say that when I was looking at this, I had to list each column out in the paste command.我会说,当我看到这个时,我不得不在paste命令中列出每一列。 When I tried to use something more scalable, like [1:ncol(..)] or something similar, it converted everything into numbers.当我尝试使用更具可扩展性的东西时,例如[1:ncol(..)]或类似的东西,它会将所有内容都转换为数字。 Maybe someone else can shed some light onto that..也许其他人可以对此有所了解..

Welcome to Stack overflow.欢迎使用堆栈溢出。 I personally would use a "tidy" approach.我个人会使用“整洁”的方法。

library(tidyverse) # or library(tidyr)

positional_data <- positional_data %>%
  unite("concatenated", `P-25`:`P333`)

That makes a new column called concatenated.这将创建一个名为 concatenated 的新列。

For what its worth, I searched google for tidy r concatenate across columns , which had this link , which compares methods.对于它的价值,我在谷歌上搜索了跨列的 tidy r concatenate ,它有这个链接,它比较了方法。

For a fuller solution, please look at suggestions for asking a reproducible example, particularly how to share data or make a minimal example.要获得更完整的解决方案,请查看询问可重现示例的建议,尤其是如何共享数据或制作最小示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM