My data set looks like this:
key date census j
1: 01_35004_10-14_+_M 11NOV2001 2.934397 01
2: 01_35004_10-14_+_M 06JAN2002 3.028231 01
3: 01_35004_10-14_+_M 07APR2002 3.180712 01
4: 01_35004_10-14_+_M 02JUN2002 3.274546 01
5: 01_35004_10-14_+_M 28JUL2002 3.368380 01
6: 01_35004_10-14_+_M 22SEP2002 3.462214 01
7: 01_35004_10-14_+_M 22DEC2002 3.614694 01
8: 01_35004_10-14_+_M 16FEB2003 3.708528 01
9: 01_35004_10-14_+_M 13JUL2003 3.954843 01
10: 01_35004_10-14_+_M 07SEP2003 4.048677 01
Certain characters within the column "key" correspond to different variables. For instance: 01 is the State, 35004 is the Zip Code, 10-14 is the Age Group, + is the Race, M is the Gender
I want to extract each of these characters to create separate variables for them (ie a column for state filled with 01, a column for Zip Code filled with 35004, etc)
Here is my code:
Var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
for(j in Var){
play$j = gsub("_.*$","",play$key)
}
Clearly this is not correct. I would like the loop to iterate through each observation in the "key" column and produce a variable with the extracted character associated with the variable.
The basic solution (without expecting a good performance) uses read.csv
:
# excerpt of your data (only the "coordinate" column containing the model point coordinates)
x <- c("01_35004_10-14_+_M", "01_35004_10-14_+_M")
# simple way is treating the string as CSV row :-)
y <- read.csv(text = x, sep="_", header=FALSE)
# Fix the wrong column names
names(y) <- c("State","Zip_Code", "Age_Group", "Race", "Gender")
# Now recode one example column by using translation ("lookup") table
gender.lookup <- data.frame( gender.code=c("M", "F"), gender.name=c("Male", "Female"))
# Add the recoded value as new column. Note: Lookup failures are ignored
y$GenderName <- gender.lookup$gender.name[match(y$Gender, gender.lookup$gender.code)]
I am leaving the implementation of the loop to your imagination since I don't have more lookup data in your question... (eg use lapply
and a list of lookup tables with the same index positions as the column indices).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.