i have to select the countries that have a number of points in the top 25% of the distribution of number of datapoints using function subset & quantiles with the %in% operator.
My dataset has these form
head(drugs1)
LOCATION TIME PC_HEALTHXP PC_GDP USD_CAP TOTAL_SPEND
1 AUS 1971 15.992 0.727 35.720 462.11
2 AUS 1972 15.091 0.686 36.056 475.11
3 AUS 1973 15.117 0.681 39.871 533.47
4 AUS 1974 14.771 0.755 47.559 652.65
5 AUS 1975 11.849 0.682 47.561 660.76
6 AUS 1976 10.920 0.630 46.908 658.26
where the first columns represent the countries & the second the data points that each country appear in each year.
I tried to apply the command a<-subset(drugs1, quantile(drugs1$TIME, 0.25),1) but the results are NULL. Can you help me with this?
Start by figuring out the number of datapoints for each country using table()
.
n <- table(drugs1$location)
Find the 25th percentile of the number of datapoints.
q <- quantile(n, .25)
Find the countries that have more than q
datapoints.
countries <- names(n)[n > q]
Subset the original data to only include countries in countries
.
drugs2 <- subset(drugs1, LOCATION %in% countries)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.