简体   繁体   中英

How i can i choose the countries that have number of points in the top 25% of the distribution of number of datapoints with subset

i have to select the countries that have a number of points in the top 25% of the distribution of number of datapoints using function subset & quantiles with the %in% operator.

My dataset has these form

head(drugs1)
  LOCATION TIME PC_HEALTHXP PC_GDP USD_CAP TOTAL_SPEND
1      AUS 1971      15.992  0.727  35.720      462.11
2      AUS 1972      15.091  0.686  36.056      475.11
3      AUS 1973      15.117  0.681  39.871      533.47
4      AUS 1974      14.771  0.755  47.559      652.65
5      AUS 1975      11.849  0.682  47.561      660.76
6      AUS 1976      10.920  0.630  46.908      658.26

where the first columns represent the countries & the second the data points that each country appear in each year.

I tried to apply the command a<-subset(drugs1, quantile(drugs1$TIME, 0.25),1) but the results are NULL. Can you help me with this?

Start by figuring out the number of datapoints for each country using table() .

n <- table(drugs1$location)

Find the 25th percentile of the number of datapoints.

q <- quantile(n, .25)

Find the countries that have more than q datapoints.

countries <- names(n)[n > q]

Subset the original data to only include countries in countries .

drugs2 <- subset(drugs1, LOCATION %in% countries)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM