I have a dataset that contains a column of the state in which a particular office is located. I would like to take that column and make a new column denoting which region of the US that office is located. The state column has the postal abbreviations for each state (ie. NY stands for New York) and I am using the US Census Bureau's Regions.
Here's a mock example of the data. I don't have a Region column, but I want to create it:
Store State Region
A FL South
B NY Northeast
C CA West
D IL Midwest
E MA Northeast
Let's make it simpler and let's just say I want to denote only offices in the Northeast. I used the following syntax:
stores$Northeast<-if(
stores$state=="ME"|"NH"|"VT"|"MA"|"RI"|"CT"|"NY"|"PA"|"NJ"){
print("Northeast")
} else{print("Non-northeast")
}
but I get an error message saying that the | operation doesn't work on characters. Is there a different function I should be using instead?
I'm posting in the interest of saving people's typing time. There are already two vectors available as part of the base R installation that can be used to do this very efficiently: state.abb
and state.region
. If you have a named vector it can be indexed via the names as a look-up facility. They both need to be converted from factor to character (and the index needs to be de-factorized as well):
# Do read `?states`. Hey, S was invented in the US, but why not some Yuropean constants?
mock <-read.table(text="Store State
A FL
B NY
C CA
D IL
E MA ",head=TRUE)
stat <- as.character(state.region)
> names(stat) <- as.character(state.abb)
> mock$Region <- stat[as.character(mock$State)]
> mock
Store State Region
1 A FL South
2 B NY Northeast
3 C CA West
4 D IL North Central
5 E MA Northeast
If you want to "edit" the regional assignments, do this:
> stat["IL"] <- "Midwest"
> mock$Region <- stat[as.character(mock$State)]
> mock
Store State Region
1 A FL South
2 B NY Northeast
3 C CA West
4 D IL Midwest
5 E MA Northeast
You should probably use the %in%
operator here:
NE = c("ME","NH","VT","MA","RI","CT","NY","PA","NJ")
if stores$state %in% NE {
print("Northeast")
} else {
print("Non-northeast")
}
You can also define a new variable this way, especially if you are going to go on to define other regions:
stores$region = "Non-northeast"
stores$region[stores$state %in% NE] = "Northeast"
You need the %in%
operator!
stores$Northeast <- ifelse(stores$state %in% c("ME", "NH", "VT", "MA", "RI", "CT", "NY", "PA", "NJ"), "Northeast", "Non-northeast")
cheers
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.