I have a council data for social care expenditure at each geographic area or council which looks like this:
Council Expenditure
Cumbria (102) 100
South Tyneside (109) 200
Bexley (718) 150
Nottingham (512) 178
As you can see in the Council
column of the data frame, you have the council name and their respective code given in brackets that is (102), (109) etc..
But I want to split the council names and their respective codes into two different columns and the remove the brackets around the council codes to look more like this:
Council Council Code Expenditure
Cumbria 102 100
South Tyneside 109 200
Bexley 718 150
Nottingham 178 178
I have looked at other similar posts on Stackoverflow for these type of questions and used array of string operations such as strsplit()
, gsub()
etc. but to no avail. I am having difficulty with the brackets in particular.
Can you please suggest how can I perform this in R ?
This is one way of getting it done using grouping
with regular expression
:
Council <- read.table(
text = "Council,Expenditure
Cumbria (102),100
South Tyneside (109),200
Bexley (718),150
Nottingham (512),78",
header = T,
sep = ",",
stringsAsFactors = F
)
Council <- transform(Council,
# Get the Coucil_Code column
Council_Code = as.numeric(gsub("([^\\d]+)(\\d+)(\\))","\\2",
Council,
perl = T)),
# Clean up the Council column
Council = trimws(gsub("([a-zA-z\\s]+)([\\d\\(\\)]+)","\\1",
Council,
perl = T))
)
Council Expenditure Council_Code
Cumbria 100 102
South Tyneside 200 109
Bexley 150 718
Nottingham 78 512
I hope this helps.
Using gsub
:
res <- setNames(data.frame(trimws(gsub("[[:digit:]\\()]","",df$Council))
, df$Expenditure, gsub("[^[:digit:]]","",df$Council)),
c("Council","Expenditure","Council Code"))
# Council Expenditure Council Code
#1 Cumbria 100 102
#2 South Tyneside 200 109
#3 Bexley 150 718
#4 Nottingham 78 512
[[:digit:]\\\\()]
: To extract only names [^[:digit:]]
: To extract numbers A tidyr
option is extract
library(tidyr)
extract(df1, Council, into = c("Council", "CouncilCode"), "([^(]+)\\s+\\(([0-9]+).")
# Council CouncilCode Expenditure
#1 Cumbria 102 100
#2 South Tyneside 109 200
#3 Bexley 718 150
#4 Nottingham 512 78
library(reshape2)
colsplit(string = gsub(pattern = "\\(|\\)",replacement = "",x = Council$Council),
pattern = " ",names = c("Council","Council_code"))
Result:
Council Council_code
1. Cumbria 102
2. South Tyneside 109
3. Bexley 718
4. Nottingham 512
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.