[英]Split word in column in R
I have a data frame with multiple columns in R.我在 R 中有一个包含多列的数据框。 I want to split the "age" column into two column, each with one string in it.
我想将“年龄”列分成两列,每列都有一个字符串。
fas value age colony
1: C12:0 0.002221915 LO 7_13
2: C13:0 0.000770179 LO 7_13
3: C14:0 0.004525352 LO 7_13
4: C15:0 0.000738928 LO 7_13
5: C16:1a 0.002964627 LO 7_13
Output: Output:
fas value size age colony
1: C12:0 0.002221915 L O 7_13
2: C13:0 0.000770179 L O 7_13
3: C14:0 0.004525352 L O 7_13
4: C15:0 0.000738928 L O 7_13
5: C16:1a 0.002964627 L O 7_13
I tried:我试过了:
data_frame<-str_split_fixed(df$age, "", 2)
With base R:带底座 R:
df$size <- substr(df$age,1,1)
df$age <- substr(df$age,2,2)
And to get the result in the column order you specified:并按照您指定的列顺序获取结果:
df[,c("fas","value","age","size","colony")]
fas value age size colony
1 C12:0 0.002221915 O L 7_13
2 C13:0 0.000770179 O L 7_13
3 C14:0 0.004525352 O L 7_13
4 C15:0 0.000738928 O L 7_13
5 C16:1a 0.002964627 O L 7_13
Since your data appears to be data.table
, I'll infer that package is loaded.由于您的数据似乎是
data.table
,我会推断 package 已加载。 However, strcapture
is base R and will return a data.frame
with the two columns (names/classes based on the third argument, proto=
).但是,
strcapture
是基础 R 并将返回一个带有两列的data.frame
(名称/类基于第三个参数proto=
)。
strcapture("(.)(.)", DT$age, list(size="", age=""))
# size age
# 1 L O
# 2 L O
# 3 L O
# 4 L O
# 5 L O
library(data.table)
DT[, c("size", "age") := strcapture("(.)(.)", age, list(size="", age="")) ]
DT
# fas value age colony size
# <char> <num> <char> <char> <char>
# 1: C12:0 0.002221915 O 7_13 L
# 2: C13:0 0.000770179 O 7_13 L
# 3: C14:0 0.004525352 O 7_13 L
# 4: C15:0 0.000738928 O 7_13 L
# 5: C16:1a 0.002964627 O 7_13 L
You may choose to be more defensive in the pattern, shifting to "^(.)(.)$"
, which should not match anything outside of our 2-char expectation.您可以选择在模式中更具防御性,转而使用
"^(.)(.)$"
,它不应该匹配我们期望的 2 字符之外的任何内容。
Data数据
DT <- data.table::fread(text="
fas value age colony
C12:0 0.002221915 LO 7_13
C13:0 0.000770179 LO 7_13
C14:0 0.004525352 LO 7_13
C15:0 0.000738928 LO 7_13
C16:1a 0.002964627 LO 7_13")
You can use sub
and backreference:您可以使用
sub
和反向引用:
df$age <- sub("(^\\w)(\\w$)", "\\1", df$age)
df$size <- sub("(^\\w)(\\w$)", "\\2", df$age)
The tidyverse solution uses tidyr::separate()
: tidyverse 解决方案使用
tidyr::separate()
:
library("tidyr")
tbl <- read.table(header = TRUE, text = "
fas value age colony
C12:0 0.002221915 LO 7_13
C13:0 0.000770179 LO 7_13
C14:0 0.004525352 LO 7_13
C15:0 0.000738928 LO 7_13
C16:1a 0.002964627 LO 7_13")
separate(tbl, age, c("age", "size"), 1)
#> fas value age size colony
#> 1 C12:0 0.002221915 L O 7_13
#> 2 C13:0 0.000770179 L O 7_13
#> 3 C14:0 0.004525352 L O 7_13
#> 4 C15:0 0.000738928 L O 7_13
#> 5 C16:1a 0.002964627 L O 7_13
Created on 2021-02-21 by the reprex package (v1.0.0)由代表 package (v1.0.0) 于 2021 年 2 月 21 日创建
To split the column age
(contains 2 characters "LO") you can要拆分列
age
(包含 2 个字符“LO”),您可以
remove last character gsub('.{1}$', '', df$age)
you get "L"删除最后一个字符
gsub('.{1}$', '', df$age)
你得到 "L"
remove first character sub('.', '', df$age)
you get "O"删除第一个字符
sub('.', '', df$age)
你得到 "O"
df %>%
mutate(size = gsub('.{1}$', '', df$age), # remove last character
age = sub('.', '', df$age)) # remove first character
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.