简体   繁体   中英

Making a new variable column using if/else statements

I have a dataset that contains a column of the state in which a particular office is located. I would like to take that column and make a new column denoting which region of the US that office is located. The state column has the postal abbreviations for each state (ie. NY stands for New York) and I am using the US Census Bureau's Regions.

Here's a mock example of the data. I don't have a Region column, but I want to create it:

Store    State    Region
A        FL       South
B        NY       Northeast
C        CA       West
D        IL       Midwest
E        MA       Northeast

Let's make it simpler and let's just say I want to denote only offices in the Northeast. I used the following syntax:

stores$Northeast<-if(
        stores$state=="ME"|"NH"|"VT"|"MA"|"RI"|"CT"|"NY"|"PA"|"NJ"){
print("Northeast")
} else{print("Non-northeast")
}

but I get an error message saying that the | operation doesn't work on characters. Is there a different function I should be using instead?

I'm posting in the interest of saving people's typing time. There are already two vectors available as part of the base R installation that can be used to do this very efficiently: state.abb and state.region . If you have a named vector it can be indexed via the names as a look-up facility. They both need to be converted from factor to character (and the index needs to be de-factorized as well):

# Do read `?states`. Hey, S was invented in the US, but why not some Yuropean constants?
 mock <-read.table(text="Store    State    
 A        FL      
 B        NY      
 C        CA      
 D        IL      
 E        MA      ",head=TRUE)
 stat <- as.character(state.region)
 > names(stat) <- as.character(state.abb)

> mock$Region  <- stat[as.character(mock$State)]
> mock
  Store State        Region
1     A    FL         South
2     B    NY     Northeast
3     C    CA          West
4     D    IL North Central
5     E    MA     Northeast

If you want to "edit" the regional assignments, do this:

> stat["IL"] <- "Midwest"
> mock$Region  <- stat[as.character(mock$State)]
> mock
  Store State    Region
1     A    FL     South
2     B    NY Northeast
3     C    CA      West
4     D    IL   Midwest
5     E    MA Northeast

You should probably use the %in% operator here:

NE = c("ME","NH","VT","MA","RI","CT","NY","PA","NJ")

if stores$state %in% NE {
    print("Northeast")
} else {
    print("Non-northeast")
}

You can also define a new variable this way, especially if you are going to go on to define other regions:

stores$region = "Non-northeast"
stores$region[stores$state %in% NE] = "Northeast"

You need the %in% operator!

stores$Northeast <- ifelse(stores$state %in% c("ME", "NH", "VT", "MA", "RI", "CT", "NY", "PA", "NJ"), "Northeast", "Non-northeast")

cheers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM