[英]How to split the column of a matrix into two columns?
I have matrix, which one of the columns of it has IDs which sperated with ",". 我有矩阵,矩阵的其中一列具有以“,”分隔的ID。 I just want to split that cloumn into two columns st each new column just have one part of IDs.
我只想将该Cloumn分成两列,而每个新列只有一部分ID。 what is the easiest way to do it ?
最简单的方法是什么?
My matrix is: 我的矩阵是:
> L
a u
[1,] "10" "mature,MIMAT0000062"
[2,] "20" "stemloop"
[3,] "40" "mature,MIMAT0000062"
and the expected output is : 预期的输出是:
> k
a u v
[1,] "10" "mature" "MIMAT0000062"
[2,] "20" "stemloop" "NA"
[3,] "40" "mature" "MIMAT0000062"
>
Edit: 编辑:
Now I have to split this matrix into two matrix based on the column with "NA" values, one with all "NA" and other without "NA". 现在,我必须根据具有“ NA”值的列将该矩阵分为两个矩阵,一个矩阵全为“ NA”,另一个矩阵全为“ NA”。
Input: 输入:
>k
a u v
[1,] "10" "mature" "MIMAT0000062"
[2,] "20" "stemloop" "NA"
[3,] "40" "mature_2" "MIMAT0000043"
Output should be like, 输出应该像
>k1
a u v
[1,] "10" "mature" "MIMAT0000062"
[2,] "40" "mature_2" "MIMAT0000043"
>k2
a u v
[1,] "20" "stemloop" "NA"
I have a function called cSplit
that is quite fast and deals with these types of problems very easily. 我有一个名为
cSplit
的函数 , 该函数非常快,可以非常轻松地处理这些类型的问题。
Here are a few examples of the function in use, along with some different cases to consider: 以下是使用中的函数的一些示例,以及要考虑的一些不同情况:
Your existing sample data: 您现有的样本数据:
M1 <- cbind(a = c(10,20,40),
u = c("mature,MIMAT0000062",
"stemloop", "mature,MIMAT0000062"))
cSplit(data.frame(M1), "u", ",")
# a u_1 u_2
# 1: 10 mature MIMAT0000062
# 2: 20 stemloop NA
# 3: 40 mature MIMAT0000062
One "u" value with a comma at the start: 一个“ u”值,以逗号开头:
M2 <- cbind(a = c(10,20,40),
u = c(",MIMAT0000062",
"stemloop", "mature,MIMAT0000062"))
cSplit(data.frame(M2), "u", ",")
# a u_1 u_2
# 1: 10 MIMAT0000062
# 2: 20 stemloop NA
# 3: 40 mature MIMAT0000062
One "u" value that splits into 3 columns: 一个“ u”值,分为3列:
M3 <- cbind(a = c(10,20,40),
u = c("mature,MIMAT0000062",
"stemloop,,something", "mature,MIMAT0000062"))
cSplit(data.frame(M3), "u", ",")
# a u_1 u_2 u_3
# 1: 10 mature MIMAT0000062 NA
# 2: 20 stemloop something
# 3: 40 mature MIMAT0000062 NA
This works when the values are comma separated: 这些值用逗号分隔时有效:
sep_cols = matrix(unlist(strsplit(as.character(L$u), ",")), ncol = 2)
new_L = cbind(L, sep_cols)
A different way.. 一种不同的方式
a <- c(10,20,40)
u <- c("mature,MIMAT0000062", "stemloop", "mature,MIMAT0000062")
L <- data.frame(a,u) #better use a data.frame
v <- strsplit(as.character(L$u), ",")
L$u <- sapply(v, `[`, 1)
L$v <- sapply(v, `[`, 2)
> L
# a u v
#1 10 mature MIMAT0000062
#2 20 stemloop <NA>
#3 40 mature MIMAT0000062
A two liner: 两个内胆:
L$v =sapply(strsplit(as.character(L$u),","), "[", 2)
L$u =sapply(strsplit(as.character(L$u),","), "[", 1)
#L
# a u v
#1 10 mature MIMAT0000062
#2 20 stemloop <NA>
#3 40 mature MIMAT0000062
Another alternative using reshape2::colsplit
as joran suggested: 另一个使用
reshape2::colsplit
替代方法,如joran建议的那样:
library(reshape2)
k = cbind(a =L$a,colsplit(L$u,",",c("u","v")))
#k
# a u v
#1 10 mature MIMAT0000062
#2 20 stemloop
#3 40 mature MIMAT0000062
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.