R 中的列中的拆分词

Question

I have a data frame with multiple columns in R.我在 R 中有一个包含多列的数据框。 I want to split the "age" column into two column, each with one string in it.我想将“年龄”列分成两列，每列都有一个字符串。

         fas       value age colony
   1:  C12:0 0.002221915  LO   7_13
   2:  C13:0 0.000770179  LO   7_13
   3:  C14:0 0.004525352  LO   7_13
   4:  C15:0 0.000738928  LO   7_13
   5: C16:1a 0.002964627  LO   7_13

Output: Output：

         fas           value size age colony
       1:  C12:0 0.002221915    L   O   7_13
       2:  C13:0 0.000770179    L   O   7_13
       3:  C14:0 0.004525352    L   O   7_13
       4:  C15:0 0.000738928    L   O   7_13
       5: C16:1a 0.002964627    L   O   7_13

I tried:我试过了：

data_frame<-str_split_fixed(df$age, "", 2)

Answer 1

With base R:带底座 R：

df$size <- substr(df$age,1,1)
df$age  <- substr(df$age,2,2)

And to get the result in the column order you specified:并按照您指定的列顺序获取结果：

df[,c("fas","value","age","size","colony")]
     fas       value age size colony
1  C12:0 0.002221915   O    L   7_13
2  C13:0 0.000770179   O    L   7_13
3  C14:0 0.004525352   O    L   7_13
4  C15:0 0.000738928   O    L   7_13
5 C16:1a 0.002964627   O    L   7_13

Answer 2

Since your data appears to be data.table , I'll infer that package is loaded.由于您的数据似乎是data.table ，我会推断 package 已加载。 However, strcapture is base R and will return a data.frame with the two columns (names/classes based on the third argument, proto= ).但是， strcapture是基础 R 并将返回一个带有两列的data.frame （名称/类基于第三个参数proto= ）。

strcapture("(.)(.)", DT$age, list(size="", age=""))
#   size age
# 1    L   O
# 2    L   O
# 3    L   O
# 4    L   O
# 5    L   O

library(data.table)
DT[, c("size", "age") := strcapture("(.)(.)", age, list(size="", age="")) ]
DT
#       fas       value    age colony   size
#    <char>       <num> <char> <char> <char>
# 1:  C12:0 0.002221915      O   7_13      L
# 2:  C13:0 0.000770179      O   7_13      L
# 3:  C14:0 0.004525352      O   7_13      L
# 4:  C15:0 0.000738928      O   7_13      L
# 5: C16:1a 0.002964627      O   7_13      L

You may choose to be more defensive in the pattern, shifting to "^(.)(.)$" , which should not match anything outside of our 2-char expectation.您可以选择在模式中更具防御性，转而使用"^(.)(.)$" ，它不应该匹配我们期望的 2 字符之外的任何内容。

Data数据

DT <- data.table::fread(text="
   fas       value age colony
 C12:0 0.002221915  LO   7_13
 C13:0 0.000770179  LO   7_13
 C14:0 0.004525352  LO   7_13
 C15:0 0.000738928  LO   7_13
C16:1a 0.002964627  LO   7_13")

Answer 3

You can use sub and backreference:您可以使用sub和反向引用：

df$age <- sub("(^\\w)(\\w$)", "\\1", df$age)
df$size <- sub("(^\\w)(\\w$)", "\\2", df$age)

Answer 4

The tidyverse solution uses tidyr::separate() : tidyverse 解决方案使用tidyr::separate() ：

library("tidyr")

tbl <- read.table(header = TRUE, text = "
   fas       value age colony
 C12:0 0.002221915  LO   7_13
 C13:0 0.000770179  LO   7_13
 C14:0 0.004525352  LO   7_13
 C15:0 0.000738928  LO   7_13
C16:1a 0.002964627  LO   7_13")

separate(tbl, age, c("age", "size"), 1)
#>      fas       value age size colony
#> 1  C12:0 0.002221915   L    O   7_13
#> 2  C13:0 0.000770179   L    O   7_13
#> 3  C14:0 0.004525352   L    O   7_13
#> 4  C15:0 0.000738928   L    O   7_13
#> 5 C16:1a 0.002964627   L    O   7_13

^{Created on 2021-02-21 by the reprex package (v1.0.0)}^{由代表 package (v1.0.0) 于 2021 年 2 月 21 日创建}

Answer 5

To split the column age (contains 2 characters "LO") you can要拆分列age （包含 2 个字符“LO”），您可以

remove last character gsub('.{1}$', '', df$age) you get "L"删除最后一个字符gsub('.{1}$', '', df$age)你得到 "L"
remove first character sub('.', '', df$age) you get "O"删除第一个字符sub('.', '', df$age)你得到 "O"

df %>% 
  mutate(size = gsub('.{1}$', '', df$age), # remove last character 
         age = sub('.', '', df$age)) # remove first character

R 中的列中的拆分词

问题描述

5 个解决方案

解决方案1
10 已采纳 2021-02-21 16:23:52

解决方案2
7 2021-02-21 16:21:34

解决方案3
7 2021-02-21 16:30:37

解决方案4
4 2021-02-21 16:30:43

解决方案5
1 2021-02-21 18:37:12

R 中的列中的拆分词

问题描述

5 个解决方案

解决方案1 10 已采纳 2021-02-21 16:23:52

解决方案2 7 2021-02-21 16:21:34

解决方案3 7 2021-02-21 16:30:37

解决方案4 4 2021-02-21 16:30:43

解决方案5 1 2021-02-21 18:37:12

解决方案1
10 已采纳 2021-02-21 16:23:52

解决方案2
7 2021-02-21 16:21:34

解决方案3
7 2021-02-21 16:30:37

解决方案4
4 2021-02-21 16:30:43

解决方案5
1 2021-02-21 18:37:12