简体   繁体   中英

How to split a character column into two columns by removing the brackets in R?

I have a council data for social care expenditure at each geographic area or council which looks like this:

Council                     Expenditure
Cumbria (102)               100
South Tyneside (109)        200
Bexley (718)                150
Nottingham (512)            178

As you can see in the Council column of the data frame, you have the council name and their respective code given in brackets that is (102), (109) etc..

But I want to split the council names and their respective codes into two different columns and the remove the brackets around the council codes to look more like this:

Council          Council Code                 Expenditure
Cumbria          102                          100
South Tyneside   109                          200
Bexley           718                          150
Nottingham       178                          178

I have looked at other similar posts on Stackoverflow for these type of questions and used array of string operations such as strsplit() , gsub() etc. but to no avail. I am having difficulty with the brackets in particular.

Can you please suggest how can I perform this in R ?

This is one way of getting it done using grouping with regular expression :

Data:

Council <- read.table(
  text = "Council,Expenditure
Cumbria (102),100
South Tyneside (109),200
Bexley (718),150
Nottingham (512),78",
  header = T,
  sep = ",",
  stringsAsFactors = F
)

Code:

Council <- transform(Council,
       # Get the Coucil_Code column
       Council_Code = as.numeric(gsub("([^\\d]+)(\\d+)(\\))","\\2",
                                               Council, 
                                               perl = T)),
       # Clean up the Council column
       Council = trimws(gsub("([a-zA-z\\s]+)([\\d\\(\\)]+)","\\1",
                                      Council, 
                                      perl = T))
)

Output:

 Council        Expenditure Council_Code
 Cumbria        100         102         
 South Tyneside 200         109         
 Bexley         150         718         
 Nottingham      78         512 

I hope this helps.

Using gsub :

res <- setNames(data.frame(trimws(gsub("[[:digit:]\\()]","",df$Council))
                    , df$Expenditure, gsub("[^[:digit:]]","",df$Council)),
                c("Council","Expenditure","Council Code"))

#         Council Expenditure Council Code
#1        Cumbria         100          102
#2 South Tyneside         200          109
#3         Bexley         150          718
#4     Nottingham          78          512
  • [[:digit:]\\\\()] : To extract only names
  • [^[:digit:]] : To extract numbers

A tidyr option is extract

library(tidyr)
extract(df1, Council, into = c("Council", "CouncilCode"), "([^(]+)\\s+\\(([0-9]+).")
#         Council CouncilCode Expenditure
#1        Cumbria         102         100
#2 South Tyneside         109         200
#3         Bexley         718         150
#4     Nottingham         512          78
library(reshape2)
colsplit(string = gsub(pattern = "\\(|\\)",replacement = "",x = Council$Council),
     pattern = " ",names = c("Council","Council_code"))

Result:

    Council Council_code
1. Cumbria          102
2. South Tyneside   109
3. Bexley           718
4. Nottingham       512

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM