简体   繁体   中英

Assigning Value to New Variable Based on Specific Values in Another Variable in R

I have a data.frame that contains state names and I would like to create a new variable called "region" in which a value is assigned based on the state that is found under the "state" variable.

For example, if the state variable has "Alabama" or "Georgia", I would like to have "Region" assigned as "South". If state is "Washington" or "California", I would like it assigned to "West". I have to do this for each of the 48 contiguous US states, and I'm having difficulty figuring out the best way to do this. Any help in this (I'm sure simple) procedure would be great. What I am looking for is something like this in the end:

State      Region
Wyoming    West
Michigan   Midwest
Alabama    South
Georgia    South
California West
Texas      Central

And to be clear, I don't have the regions in a separate file, i have to create this as a new variable and create the region names myself. I'm just looking for a way that the code can go through all 3000 lines that I have and can automatically assign the region name once I tell it how to do so.

Rather than type the region for every state, you can use the built-in "state.name" and "state.region" variables from the 'datasets' package (like Jon Spring suggests in his comment), eg

library(tidyverse)
library(datasets)

state_lookup_table <- data.frame(name = state.name,
                                 region = state.region)

my_df <- data.frame(place = c("Washington", "California"),
                    value = c(1000, 2000))
my_df
#>        place value
#> 1 Washington  1000
#> 2 California  2000

my_df %>%
  left_join(state_lookup_table, by = c("place" = "name"))
#>        place value region
#> 1 Washington  1000   West
#> 2 California  2000   West

Created on 2022-09-02 by the reprex package (v2.0.1)

I would go this way:

df <- data.frame(name = c("john", "will", "thomas", "Ali"),
                 state = c("California", "Alabama", "Washington", "Georgia"))

region_df <- data.frame(state= c("Alabama", "Georgia", "Washington"),
                        region = c("south", "south", "west"))

merged.df <- merge(df, region_df, all.x = TRUE, on= "state")

I think you need a reference to do so. For your specific question, a dict would be the best solution.

ref_ge <- {}
ref_ge["Georgia"]="South"
ref_ge["Alabama"]="South"
ref_ge["California"]="West"
ref1["Georgia"]

#Or, if you could read the state->region information from an excel to a dataframe
df=data.frame(state=c("Georgia","Alabama","California"),region=c("South","South","West"))
ref2 <- df$region
names(ref2) <- df$state
ref2["Georgia"]


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM