[英]How to split column into two columns by extracting?
I would like to split columns into two and extract and keep the numbers alone in one column.我想将列分成两列,然后将数字单独提取并保留在一列中。
df <- data.frame(V1 = c("[1] Strongly disagree", "[2] Somewhat disagree", "[3] Neither", "[4] Somewhat agree", "[5] Strongly agree"))
V1
[1] Strongly disagree
[2] Somewhat disagree
[3] Neither
[4] Somewhat agree
[5] Strongly agree
I tried using the separate
function from tidyr
:我尝试使用
tidyr
的separate
函数:
tidyr::separate(df, V1, into = c("Value", "Label"), sep = "] ")
Value Label
[1 Strongly disagree
[2 Somewhat disagree
[3 Neither
[4 Somewhat agree
[5 Strongly agree
I might be able to remove the [
with another function, but I was wondering if I can fix this in one step and wonder if there is another function that does the job.我也许可以用另一个函数删除
[
,但我想知道我是否可以一步解决这个问题,并想知道是否有另一个函数可以完成这项工作。
I am trying to get this in the end我试图最终得到这个
Label Value
Strongly disagree 1
Somewhat disagree 2
Neither 3
Somewhat agree 4
Strongly agree 5
If you are more into base R, here is the base R solution:如果您更喜欢基础 R,这里是基础 R 解决方案:
df <- data.frame(V1 = c("[1] Strongly disagree", "[2] Somewhat disagree", "[3] Neither", "[4] Somewhat agree", "[5] Strongly agree"))
df$value = as.numeric(regmatches(df$V1, regexpr(r"(\d)", df$V1)))
df$V1 = regmatches(df$V1, regexpr("(?<=] ).*", df$V1, perl=TRUE))
df
#> V1 value
#> 1 Strongly disagree 1
#> 2 Somewhat disagree 2
#> 3 Neither 3
#> 4 Somewhat agree 4
#> 5 Strongly agree 5
Created on 2020-09-05 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 9 月 5 日创建
regmatches
is a base R function, which returns the matched value from the vector, it takes as an input a vector and a regexpr
object. regmatches
是一个基本的 R 函数,它从向量中返回匹配的值,它将向量和一个regexpr
对象作为输入。
If the first case ( value
column) \\d
is used to extract the digit.如果第一种情况(
value
列) \\d
用于提取数字。 In second case, (?<=] ).*
is used to return anything that matches after ]
,在第二种情况下,
(?<=] ).*
用于返回在]
之后匹配的任何内容,
Try this approach:试试这个方法:
library(tidyverse)
#Data
df <- data.frame(V1 = c("[1] Strongly disagree",
"[2] Somewhat disagree",
"[3] Neither",
"[4] Somewhat agree",
"[5] Strongly agree"))
#Mutate
df %>% separate(V1,into = c('V1','V2'),sep = ']') %>%
mutate(V1=gsub("[[:punct:]]",'',V1))
Output:输出:
V1 V2
1 1 Strongly disagree
2 2 Somewhat disagree
3 3 Neither
4 4 Somewhat agree
5 5 Strongly agree
If you want further to have other names you can use rename()
:如果您想进一步拥有其他名称,可以使用
rename()
:
#Mutate 2
df %>% separate(V1,into = c('V1','V2'),sep = ']') %>%
mutate(V1=gsub("[[:punct:]]",'',V1)) %>%
rename(Label=V2,Value=V1) %>% select(c(2,1))
Output:输出:
Label Value
1 Strongly disagree 1
2 Somewhat disagree 2
3 Neither 3
4 Somewhat agree 4
5 Strongly agree 5
Another way you can try str_extract
to get the value and str_remove
to get rid of square brackets in the label column.你可以尝试另一种方式
str_extract
获得的价值和str_remove
摆脱方括号在标签栏。
library(dplyr)
library(stringr)
df %>%
transmute(value = str_extract(V1, "\\d+"),
label = str_remove(V1, "\\[.*\\]"))
# value label
# 1 1 Strongly disagree
# 2 2 Somewhat disagree
# 3 3 Neither
# 4 4 Somewhat agree
# 5 5 Strongly agree
An option with extract
一个带有
extract
的选项
library(tidyr)
library(dplyr)
df %>%
extract(V1, into = c("Value", "Label"), "^\\[(\\d+)\\]\\s*(.*)")
# Value Label
#1 1 Strongly disagree
#2 2 Somewhat disagree
#3 3 Neither
#4 4 Somewhat agree
#5 5 Strongly agree
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.