如何使用 R 将 ID 分成不同的行

Question

I am using R.我正在使用 R。 I have a column in a dataframe.我在 dataframe 中有一个专栏。 Here is an example of part of the column:以下是部分列的示例：

|NEW.ID|
|------|
|P02538 [551-559]; P04259 [551-559]|
|A0A0B4J2F2 1xPhospho [T473]|
|Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]|
|A0A1B0GX95 2xPhospho [S24; S26]|

I want to separate the rows where there are two accession code IDs.我想分隔有两个登录代码 ID 的行。 Although the IDs are separated by ';', I need to take into account that some IDs may have a ';'虽然 ID 用 ';' 分隔，但我需要考虑到某些 ID 可能有 ';' in it such as the third row in the column above.例如上面列中的第三行。 The only way I can see to distinguish the separation if I have a condition that states if it has '];'如果我有一个条件表明它是否有'];'，我可以看到区分分离的唯一方法followed by a letter, split the row.后跟一个字母，拆分行。 However, I don't know how to go about this.但是，我不知道如何 go 关于这个。

So in the example column above, I want to achieve:所以在上面的示例列中，我想实现：

|NEW.ID|
|------|
|P02538 [551-559]|
|P04259 [551-559]|
|A0A0B4J2F2 1xPhospho [T473]|
|Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]|
|A0A1B0GX95 2xPhospho [S24; S26]|

So the original first row is split into two.所以原来的第一行被分成了两行。 Any help would be much appreciated and please say if further clarification is required (I am still relatively new to stackoverflow).任何帮助将不胜感激，如果需要进一步澄清，请说（我对stackoverflow还是比较陌生）。

Answer 1

We may use separate_rows with a regex lookaround - ie split at the ;我们可以使用带有正则表达式环视的separate_rows行 - 即在;处拆分。 followed by a space (后跟一个空格 ( ) that succeeds a closing bracket ( ] ) and before an upper case letter ) 在右括号 ( ] ) 之后和大写字母之前

library(tidyr)
separate_rows(df1, NEW.ID, sep = "(?<=\\]); (?=[A-Z])")

-output -输出

# A tibble: 5 × 1
  NEW.ID                                     
  <chr>                                      
1 P02538 [551-559]                           
2 P04259 [551-559]                           
3 A0A0B4J2F2 1xPhospho [T473]                
4 Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]
5 A0A1B0GX95 2xPhospho [S24; S26]

data数据

df1 <- structure(list(NEW.ID = c("P02538 [551-559]; P04259 [551-559]", 
"A0A0B4J2F2 1xPhospho [T473]", "Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]", 
"A0A1B0GX95 2xPhospho [S24; S26]")), class = "data.frame", 
row.names = c(NA, 
-4L))

如何使用 R 将 ID 分成不同的行

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-01-14 17:01:52

data数据

如何使用 R 将 ID 分成不同的行

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-01-14 17:01:52

data数据

解决方案1
2 已采纳 2022-01-14 17:01:52