简体   繁体   中英

Apply quantile() function on a dataframe

I need to know how to filter a dataframe so that only the results belonging to quantile 3 (Q3, 0.75) appear in some specific columns. I will try to explain myself. I have the following dataframe:

https://drive.google.com/file/d/1blYWBXCrXpH37Wz4r0mVJGbwFsdesGi-/view?usp=sharing

在此处输入图片说明

I need the code to returns a table with all the columns, and with all the rows that meet the condition of being in Q3 (0.75) of the following columns:

educ, salario, salini, tiempemp, expprev

Any ideas? Thanks to everyone beforehand!


I have temporarily resolved the issue by calculating the quantiles manually and doing conditional filtering as I show below. Would there be any way to improve this solution?:

quantile(empleados$educ, .75)
quantile(empleados$salario, .75)
quantile(empleados$salini, .75)
quantile(empleados$tiempemp, .75)
quantile(empleados$expprev, .75)


data.frame(empleados)
arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))


ok <- arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
View(ok)

在此处输入图片说明

We can use mutate_at over specific columns and then use filter_at to select rows where all the values are satisfied.

library(dplyr)
cols <- c("educ", "salario", "salini", "tiempemp", "expprev")

Empleados %>% 
  mutate_at(cols, list(col = ~. > quantile(., 0.75))) %>%
  filter_at(vars(ends_with('col')), all_vars(.)) %>%
  select(-ends_with('col'))

#   id sexo    fechnac educ catlab salario salini tiempemp expprev
#1  11    2   2/7/1950   16      1   30300  16500       98     143
#2 134    2 11/10/1941   16      3   41550  24990       89     285

A version that uses base R

# downloaded data file located here...
df <- read.csv('~/Downloads/Empleados.dat', sep = '\t')
numerics <- c("educ", "salario", "salini", "tiempemp", "expprev")
quantiles <- sapply(numerics, function(n) quantile(df[,n])[4])
quantilenames <- names(quantiles)
comparison <- data.frame(mapply(function(x,y) df[,y] >= quantiles[x], quantilenames, numerics))
comparison$alltrue <- apply(comparison, MARGIN = 1, all)

df.1 <- cbind(df, comparison)

df.1[df.1$alltrue,]
#    id sexo    fechnac educ catlab salario salini tiempemp expprev educ.75. salario.75. salini.75. tiempemp.75. expprev.75. alltrue
#6   11    2   2/7/1950   16      1   30300  16500       98     143     TRUE        TRUE       TRUE         TRUE        TRUE    TRUE
#7   14    2  2/26/1949   15      1   35100  16800       98     137     TRUE        TRUE       TRUE         TRUE        TRUE    TRUE
#21  74    2  4/28/1933   15      1   33900  19500       93     192     TRUE        TRUE       TRUE         TRUE        TRUE    TRUE
#50 134    2 11/10/1941   16      3   41550  24990       89     285     TRUE        TRUE       TRUE         TRUE        TRUE    TRUE

I have temporarily resolved the issue by calculating the quantiles manually and doing conditional filtering as I show below. Would there be any way to improve this solution?

quantile(empleados$educ, .75)
quantile(empleados$salario, .75)
quantile(empleados$salini, .75)
quantile(empleados$tiempemp, .75)
quantile(empleados$expprev, .75)


data.frame(empleados)
arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))


ok <- arrange(filter(empleados, educ >= 12, salario >= 28500, salini >= 14250, tiempemp >= 88, expprev >= 122.25, salario))
View(ok)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM