I want to group each police station in the UK based on its region, however being a newbie I don't know how to rename multiple elements at once.
Example: How it currently looks like
The police stations of Avon and Somerset, Dorset, Gloucester and Wiltshire are located in the South West. I need a function that renames the police stations above "South West".
I would do it in the original csv data set I donwloaded from the UK police website, however my analysis ranges from January 2019 to November 2020 and each csv data set can only be downloaded by month, by region (for a total of about 900 csv files).
I am aware of the function below to select single cells in a data frame, however this data set is way too big for this to be viable.
data[row number, col number] <- "South West"
Any suggestion would be greatly appreciated. Thanks in advance for rescuing a newbie.
ps I merged every single csv dataset of every police station throghout 2019 and 2020 using
crimedata19_20 <- list.files(path="C:/Users/X/Desktop/Crime data/2019-2020",
pattern="*.csv")
crimedata19_20 <- do.call("rbind",lapply(crimedata19_20,FUN=function(files){ read.csv(files)}))
Using gsub
with which you may replace a pattern. Example using the iris
data set that comes with R:
iris[49:52, ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 49 5.3 3.7 1.5 0.2 setosa
# 50 5.0 3.3 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 52 6.4 3.2 4.5 1.5 versicolor
Replace all "setosa"
with "South West"
in the "Species"
column.
res <- transform(iris,
Species=gsub(pattern="setosa", replacement="south West", Species))
res[49:52, ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 49 5.3 3.7 1.5 0.2 south West
# 50 5.0 3.3 1.4 0.2 south West
# 51 7.0 3.2 4.7 1.4 versicolor
# 52 6.4 3.2 4.5 1.5 versicolor
Multiple replacements you may separate with an |
(or).
res2 <- transform(iris,
Species=gsub(pattern="setosa|versicolor", replacement="south West", Species))
res2[49:52, ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 49 5.3 3.7 1.5 0.2 south West
# 50 5.0 3.3 1.4 0.2 south West
# 51 7.0 3.2 4.7 1.4 south West
# 52 6.4 3.2 4.5 1.5 south West
Using same data as @jay.sf , you could store unique values in a dataframe and then make the replace using match()
:
#Keys
Keys <- data.frame(Species=unique(iris$Species),
Replace=c('South','North','East'),stringsAsFactors = F)
It will look like this:
Keys
Species Replace
1 setosa South
2 versicolor North
3 virginica East
Next, the replacement:
#Replace
iris$Species <- Keys[match(iris$Species,Keys$Species),"Replace"]
Output:
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 South
2 4.9 3.0 1.4 0.2 South
3 4.7 3.2 1.3 0.2 South
4 4.6 3.1 1.5 0.2 South
5 5.0 3.6 1.4 0.2 South
6 5.4 3.9 1.7 0.4 South
tail(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 East
146 6.7 3.0 5.2 2.3 East
147 6.3 2.5 5.0 1.9 East
148 6.5 3.0 5.2 2.0 East
149 6.2 3.4 5.4 2.3 East
150 5.9 3.0 5.1 1.8 East
Just to complete methods
library(data.table)
crimedata19_20 <-data.table(crimedata19_20)
West_cols<-c("name1", "name2", ...)
crimedata19_20[Falls.within %in% West_cols, Area:="South West"]
I would not use gsub
and instead create a new column for your Areas. Maybe you need the information about the stations later on.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.