如何在R中將字符列拆分為多列

Question

我有一個數據框x ：

dput(x)
structure(list(District = structure(c(6L, 6L, 6L, 6L, 6L, 6L), .Label = c("District - Central (06)", 
"District - East (04)", "District - New Delhi (05)", "District - North (02)", 
"District - North East (03)", "District - North West (01)", "District - South (09)", 
"District - South West (08)", "District - West (07)"), class = "factor"), 
    Age = structure(c(103L, 1L, 2L, 14L, 25L, 36L), .Label = c("0", 
    "1", "10", "100+", "11", "12", "13", "14", "15", "16", "17", 
    "18", "19", "2", "20", "21", "22", "23", "24", "25", "26", 
    "27", "28", "29", "3", "30", "31", "32", "33", "34", "35", 
    "36", "37", "38", "39", "4", "40", "41", "42", "43", "44", 
    "45", "46", "47", "48", "49", "5", "50", "51", "52", "53", 
    "54", "55", "56", "57", "58", "59", "6", "60", "61", "62", 
    "63", "64", "65", "66", "67", "68", "69", "7", "70", "71", 
    "72", "73", "74", "75", "76", "77", "78", "79", "8", "80", 
    "81", "82", "83", "84", "85", "86", "87", "88", "89", "9", 
    "90", "91", "92", "93", "94", "95", "96", "97", "98", "99", 
    "Age not stated", "All ages"), class = "factor"), Total = c(3656539L, 
    56131L, 58644L, 63835L, 63859L, 64945L), Rural = c(213950L, 
    3589L, 3757L, 4200L, 4102L, 4223L), Urban = c(3442589L, 52542L, 
    54887L, 59635L, 59757L, 60722L)), .Names = c("District", 
"Age", "Total", "Rural", "Urban"), row.names = c(NA, 6L), class = "data.frame")

我想拆分District列，將地區名稱提取到新的Name列中。 例如，“分區-西北（01）”應拆分為“西北”。 我嘗試了str_split_fixed並得到：

x
                    District      Age   Total  Rural   Urban 1    name
1 District - North West (01) All ages 3656539 213950 3442589      North West (01)
2 District - North West (01)        0   56131   3589   52542      North West (01)
3 District - North West (01)        1   58644   3757   54887      North West (01)
4 District - North West (01)        2   63835   4200   59635      North West (01)
5 District - North West (01)        3   63859   4102   59757      North West (01)
6 District - North West (01)        4   64945   4223   60722      North West (01)

我嘗試再次使用相同的功能來拆分name列，以將區域名稱與代碼分開，但是它給了我以下錯誤：

stri_split_regex中的錯誤（字符串，模式，n = n，簡化= TRUE，opts_regex = attr（模式，：在regexp模式中嵌套的括號不正確。）（U_REGEX_MISMATCHED_PAREN）

有沒有一種方法可以根據單個函數中的模式將字符列分為多個列？

Answer 1

你可以用

library(stringr)

data.frame(str_split_fixed(df$District, " ", 3))

    X1      X2      X3
1 District   -    North West (01)
2 District   -    North West (01)
3 District   -    North West (01)
4 District   -    North West (01)
5 District   -    North West (01)
6 District   -    North West (01)

您可以使用gsub刪除此處的多余內容，

gsub("[[:digit:]]","",df$X3)
gsub("[[:punct:]]","",df$X3)

等等

Answer 2

您可以使用gsub獲得所需的內容：

gsub("^.* +- +([A-Za-z ]+) \\(.*$", "\\1", df$District)
[1] "North West" "North West" "North West" "North West" "North West" "North West"

gsub （“ ^。* +-+（[A-Za-z] +）\\（。* $”）的第一個參數是一個正則表達式，其解釋如下：

從字符串“ ^”的開頭開始，匹配任何字符“。*”，后跟至少一個空格，一個連字符和至少一個空格“ +-+”。 然后捕獲由（至少一個）字母和空格“ [A-Za-z] +”組成的下一個文本“（）”。 到達帶括號“ \\\\（”的空格時停止捕獲，然后匹配所有內容，直到文本“。* $”的末尾。

gsub的第二個參數“ \\\\ 1”表示用括號捕獲的文本替換文本。

要將其分配給變量：

df$name <- gsub("^.* +- +([A-Za-z ]+) \\(.*$", "\\1", df$District)

Answer 3

您還可以匹配並提取：

library(stringi)
library(dplyr)
library(purrr)

mutate(x,
       name=map_chr(stri_match_all_regex(District, "- ([[:alpha:]]+ [[:alpha:]]+) "), function(x) x[,2]),
       code=map_chr(stri_match_all_regex(District, "\\(([[:digit:]]+)\\)"), function(x) x[,2]))

##                     District      Age   Total  Rural   Urban       name code
## 1 District - North West (01) All ages 3656539 213950 3442589 North West   01
## 2 District - North West (01)        0   56131   3589   52542 North West   01
## 3 District - North West (01)        1   58644   3757   54887 North West   01
## 4 District - North West (01)        2   63835   4200   59635 North West   01
## 5 District - North West (01)        3   63859   4102   59757 North West   01
## 6 District - North West (01)        4   64945   4223   60722 North West   01

如何在R中將字符列拆分為多列

問題描述

3 個解決方案

解決方案1
4 2016-07-19 12:33:10

解決方案2
2 已采納 2016-07-19 12:32:12

解決方案3
2 2016-07-19 12:35:43

如何在R中將字符列拆分為多列

問題描述

3 個解決方案

解決方案1 4 2016-07-19 12:33:10

解決方案2 2 已采納 2016-07-19 12:32:12

解決方案3 2 2016-07-19 12:35:43

解決方案1
4 2016-07-19 12:33:10

解決方案2
2 已采納 2016-07-19 12:32:12

解決方案3
2 2016-07-19 12:35:43