简体   繁体   English

如何使用 R 将 ID 分成不同的行

[英]How to separate IDs into different rows using R

I am using R.我正在使用 R。 I have a column in a dataframe.我在 dataframe 中有一个专栏。 Here is an example of part of the column:以下是部分列的示例:

|NEW.ID|
|------|
|P02538 [551-559]; P04259 [551-559]|
|A0A0B4J2F2 1xPhospho [T473]|
|Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]|
|A0A1B0GX95 2xPhospho [S24; S26]|

I want to separate the rows where there are two accession code IDs.我想分隔有两个登录代码 ID 的行。 Although the IDs are separated by ';', I need to take into account that some IDs may have a ';'虽然 ID 用 ';' 分隔,但我需要考虑到某些 ID 可能有 ';' in it such as the third row in the column above.例如上面列中的第三行。 The only way I can see to distinguish the separation if I have a condition that states if it has '];'如果我有一个条件表明它是否有'];',我可以看到区分分离的唯一方法followed by a letter, split the row.后跟一个字母,拆分行。 However, I don't know how to go about this.但是,我不知道如何 go 关于这个。

So in the example column above, I want to achieve:所以在上面的示例列中,我想实现:

|NEW.ID|
|------|
|P02538 [551-559]|
|P04259 [551-559]|
|A0A0B4J2F2 1xPhospho [T473]|
|Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]|
|A0A1B0GX95 2xPhospho [S24; S26]|

So the original first row is split into two.所以原来的第一行被分成了两行。 Any help would be much appreciated and please say if further clarification is required (I am still relatively new to stackoverflow).任何帮助将不胜感激,如果需要进一步澄清,请说(我对stackoverflow还是比较陌生)。

We may use separate_rows with a regex lookaround - ie split at the ;我们可以使用带有正则表达式环视的separate_rows行 - 即在;处拆分。 followed by a space (后跟一个空格 ( ) that succeeds a closing bracket ( ] ) and before an upper case letter ) 在右括号 ( ] ) 之后和大写字母之前

library(tidyr)
separate_rows(df1, NEW.ID, sep = "(?<=\\]); (?=[A-Z])")

-output -输出

# A tibble: 5 × 1
  NEW.ID                                     
  <chr>                                      
1 P02538 [551-559]                           
2 P04259 [551-559]                           
3 A0A0B4J2F2 1xPhospho [T473]                
4 Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]
5 A0A1B0GX95 2xPhospho [S24; S26]          

data数据

df1 <- structure(list(NEW.ID = c("P02538 [551-559]; P04259 [551-559]", 
"A0A0B4J2F2 1xPhospho [T473]", "Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]", 
"A0A1B0GX95 2xPhospho [S24; S26]")), class = "data.frame", 
row.names = c(NA, 
-4L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM