I have data on wages and about 95% of them are given in hourly format, however some of them are given as an annual salary. So I made a function to convert the annual salaries to hourly, however it takes 1 min 40 sec to run, when my dataset is 43000 rows x 12 columns (which I didnt think would be too big) so I did not think it would take this long.
I am curious if there is a better way to do this than the current function I have created. I am new with dplyr and tidyverse so ideally an answer using those capabilities.
Here is some sample data:
NOC4 Region Region_Name Wage_2012 Wage_2013 Wage_2014
0011 ER10 National 28.1 65000 NA
0011 ER1010 Northern NA 30.5 18
0011 ER1020 Southern 42.3 72000 22
0011 ER1030 Eastern 12 NA 45500
0011 ER1040 Western 8 NA 99000
0011 ER10 National NA 65000 NA
Here is what it should look like after the function:
NOC4 Region Region_Name Wage_2012 Wage_2013 Wage_2014
0011 ER10 National 28.1 33.33 NA
0011 ER1010 Northern NA 30.5 18
0011 ER1020 Southern 42.3 36.92 22
0011 ER1030 Eastern 12 NA 23.33
0011 ER1040 Western 8 NA 50.77
0011 ER10 National NA 33.33 NA
Here is the function:
year_to_hour <- function(dataset, salary, startcol){
# where "startcol" should be the first column containing the numeric
# values that you are trying to convert.
for(i in startcol:ncol(dataset)){
for(j in 1:nrow(dataset)){
if(is.na(dataset[j, i])){
j = j+1
}else if(as.numeric(dataset[j, i]) >= as.numeric(salary)){
dataset[j, i] = dataset[j, i]/1950
}
else{
dataset[j, i] = dataset[j, i]
}
}
}
return(as_tibble(dataset))
}
converted <- year_to_hour(wage_data_messy, 1000, 4)
R will work much faster if you let it handle the loops under the hood through "vectorized" code.
http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html
Here's an approach using dplyr
:
library(dplyr)
salary <- 1000
df %>%
mutate_at(vars(Wage_2012:Wage_2014), # For these columns...
~ . / if_else(. > salary, 1950, 1)) # Divide by 1950 if > salary
Using dplyr
I would use mutate_if
salary <- 1000
df %>% mutate_if(is.numeric, ~ifelse(. > salary, ./1950, .))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.