简体   繁体   English

R 中的列中的拆分词

[英]Split word in column in R

I have a data frame with multiple columns in R.我在 R 中有一个包含多列的数据框。 I want to split the "age" column into two column, each with one string in it.我想将“年龄”列分成两列,每列都有一个字符串。

         fas       value age colony
   1:  C12:0 0.002221915  LO   7_13
   2:  C13:0 0.000770179  LO   7_13
   3:  C14:0 0.004525352  LO   7_13
   4:  C15:0 0.000738928  LO   7_13
   5: C16:1a 0.002964627  LO   7_13

Output: Output:

         fas           value size age colony
       1:  C12:0 0.002221915    L   O   7_13
       2:  C13:0 0.000770179    L   O   7_13
       3:  C14:0 0.004525352    L   O   7_13
       4:  C15:0 0.000738928    L   O   7_13
       5: C16:1a 0.002964627    L   O   7_13

I tried:我试过了:

data_frame<-str_split_fixed(df$age, "", 2)

With base R:带底座 R:

df$size <- substr(df$age,1,1)
df$age  <- substr(df$age,2,2)

And to get the result in the column order you specified:并按照您指定的列顺序获取结果:

df[,c("fas","value","age","size","colony")]
     fas       value age size colony
1  C12:0 0.002221915   O    L   7_13
2  C13:0 0.000770179   O    L   7_13
3  C14:0 0.004525352   O    L   7_13
4  C15:0 0.000738928   O    L   7_13
5 C16:1a 0.002964627   O    L   7_13

Since your data appears to be data.table , I'll infer that package is loaded.由于您的数据似乎是data.table ,我会推断 package 已加载。 However, strcapture is base R and will return a data.frame with the two columns (names/classes based on the third argument, proto= ).但是, strcapture是基础 R 并将返回一个带有两列的data.frame (名称/类基于第三个参数proto= )。

strcapture("(.)(.)", DT$age, list(size="", age=""))
#   size age
# 1    L   O
# 2    L   O
# 3    L   O
# 4    L   O
# 5    L   O

library(data.table)
DT[, c("size", "age") := strcapture("(.)(.)", age, list(size="", age="")) ]
DT
#       fas       value    age colony   size
#    <char>       <num> <char> <char> <char>
# 1:  C12:0 0.002221915      O   7_13      L
# 2:  C13:0 0.000770179      O   7_13      L
# 3:  C14:0 0.004525352      O   7_13      L
# 4:  C15:0 0.000738928      O   7_13      L
# 5: C16:1a 0.002964627      O   7_13      L

You may choose to be more defensive in the pattern, shifting to "^(.)(.)$" , which should not match anything outside of our 2-char expectation.您可以选择在模式中更具防御性,转而使用"^(.)(.)$" ,它不应该匹配我们期望的 2 字符之外的任何内容。


Data数据

DT <- data.table::fread(text="
   fas       value age colony
 C12:0 0.002221915  LO   7_13
 C13:0 0.000770179  LO   7_13
 C14:0 0.004525352  LO   7_13
 C15:0 0.000738928  LO   7_13
C16:1a 0.002964627  LO   7_13")

You can use sub and backreference:您可以使用sub和反向引用:

df$age <- sub("(^\\w)(\\w$)", "\\1", df$age)
df$size <- sub("(^\\w)(\\w$)", "\\2", df$age)

The tidyverse solution uses tidyr::separate() : tidyverse 解决方案使用tidyr::separate()

library("tidyr")

tbl <- read.table(header = TRUE, text = "
   fas       value age colony
 C12:0 0.002221915  LO   7_13
 C13:0 0.000770179  LO   7_13
 C14:0 0.004525352  LO   7_13
 C15:0 0.000738928  LO   7_13
C16:1a 0.002964627  LO   7_13")

separate(tbl, age, c("age", "size"), 1)
#>      fas       value age size colony
#> 1  C12:0 0.002221915   L    O   7_13
#> 2  C13:0 0.000770179   L    O   7_13
#> 3  C14:0 0.004525352   L    O   7_13
#> 4  C15:0 0.000738928   L    O   7_13
#> 5 C16:1a 0.002964627   L    O   7_13

Created on 2021-02-21 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 2 月 21 日创建

To split the column age (contains 2 characters "LO") you can要拆分列age (包含 2 个字符“LO”),您可以

  1. remove last character gsub('.{1}$', '', df$age) you get "L"删除最后一个字符gsub('.{1}$', '', df$age)你得到 "L"

  2. remove first character sub('.', '', df$age) you get "O"删除第一个字符sub('.', '', df$age)你得到 "O"

df %>% 
  mutate(size = gsub('.{1}$', '', df$age), # remove last character 
         age = sub('.', '', df$age)) # remove first character 

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM