簡體   English   中英

如何通過提取將列拆分為兩列?

[英]How to split column into two columns by extracting?

我想將列分成兩列,然后將數字單獨提取並保留在一列中。

df <- data.frame(V1 = c("[1] Strongly disagree", "[2] Somewhat disagree", "[3] Neither", "[4] Somewhat agree", "[5] Strongly agree"))
                  V1
 [1] Strongly disagree
 [2] Somewhat disagree
 [3] Neither
 [4] Somewhat agree
 [5] Strongly agree

我嘗試使用tidyrseparate函數:

tidyr::separate(df, V1, into = c("Value", "Label"), sep = "] ")

Value   Label
[1      Strongly disagree           
[2      Somewhat disagree           
[3      Neither         
[4      Somewhat agree          
[5      Strongly agree

我也許可以用另一個函數刪除[ ,但我想知道我是否可以一步解決這個問題,並想知道是否有另一個函數可以完成這項工作。

我試圖最終得到這個

        Label        Value
 Strongly disagree     1
 Somewhat disagree     2
 Neither               3
 Somewhat agree        4
 Strongly agree        5

如果您更喜歡基礎 R,這里是基礎 R 解決方案:

df <- data.frame(V1 = c("[1] Strongly disagree", "[2] Somewhat disagree", "[3] Neither", "[4] Somewhat agree", "[5] Strongly agree"))

df$value = as.numeric(regmatches(df$V1, regexpr(r"(\d)", df$V1)))

df$V1 = regmatches(df$V1, regexpr("(?<=] ).*", df$V1, perl=TRUE))
df
#>                  V1 value
#> 1 Strongly disagree     1
#> 2 Somewhat disagree     2
#> 3           Neither     3
#> 4    Somewhat agree     4
#> 5    Strongly agree     5

reprex 包(v0.3.0) 於 2020 年 9 月 5 日創建

regmatches是一個基本的 R 函數,它從向量中返回匹配的值,它將向量和一個regexpr對象作為輸入。

如果第一種情況( value列) \\d用於提取數字。 在第二種情況下, (?<=] ).*用於返回在]之后匹配的任何內容,

試試這個方法:

library(tidyverse)
#Data
df <- data.frame(V1 = c("[1] Strongly disagree",
                        "[2] Somewhat disagree",
                        "[3] Neither", 
                        "[4] Somewhat agree",
                        "[5] Strongly agree"))
#Mutate
df %>% separate(V1,into = c('V1','V2'),sep = ']') %>%
  mutate(V1=gsub("[[:punct:]]",'',V1))

輸出:

  V1                 V2
1  1  Strongly disagree
2  2  Somewhat disagree
3  3            Neither
4  4     Somewhat agree
5  5     Strongly agree

如果您想進一步擁有其他名稱,可以使用rename()

#Mutate 2
df %>% separate(V1,into = c('V1','V2'),sep = ']') %>%
  mutate(V1=gsub("[[:punct:]]",'',V1)) %>%
  rename(Label=V2,Value=V1) %>% select(c(2,1))

輸出:

               Label Value
1  Strongly disagree     1
2  Somewhat disagree     2
3            Neither     3
4     Somewhat agree     4
5     Strongly agree     5

你可以嘗試另一種方式str_extract獲得的價值和str_remove擺脫方括號在標簽欄。

library(dplyr)
library(stringr)
df %>% 
  transmute(value = str_extract(V1, "\\d+"),
         label = str_remove(V1, "\\[.*\\]"))
#    value              label
# 1      1  Strongly disagree
# 2      2  Somewhat disagree
# 3      3            Neither
# 4      4     Somewhat agree
# 5      5     Strongly agree

一個帶有extract的選項

library(tidyr)
library(dplyr)
df %>% 
   extract(V1, into = c("Value", "Label"), "^\\[(\\d+)\\]\\s*(.*)")
#  Value             Label
#1     1 Strongly disagree
#2     2 Somewhat disagree
#3     3           Neither
#4     4    Somewhat agree
#5     5    Strongly agree

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM