简体   繁体   中英

How to take a Probability Proportional to Size (PPS) Unequal Probability sample using R?

I have very little programming experience, but I'm working on a statistics project and would like to generate an unequal probability sample where the inclusion probability of a unit is based on its size (PPS).

Basically, I have two datasets:

  • ds1 lists US states and the parameter I'm trying to estimate
  • ds2 has the population size of each state.

My questions:

  1. I want to use R to select a random sample from the first dataset using inclusion probabilities based on the population of each state (second dataset).

  2. Also is there any way to use R to calculate these Generalized Unequal Probability Estimator formulas?

广义不等概率估计器 广义不等概率估计量的估计方差

Also just a note on the formulas: pi_i is inclusion probability and pi_ij is joint inclusion probability.

Yes, that's called weighted sampling . Simply set the weight to the size of the state, strictly you don't even need to normalize them by 1/sum(sizes) although it's always good practice to. There are tons of duplicate posts on SO showing how to do weighted sampling.

The only tiny complication is that you need to do a join() of the datasets ds1, ds2 . Show us what code you've tried if it's causing problems. Recommend you use either dplyr or data.table .

Your second question should be asked as a separate question, and is offtopic on SO, or at least won't get a great response - best to ask statistical questions at sister site CrossValidated

There is a package for the same in R - pps and the documentation is here .

Also, there is another package called survey with a bit of documentation here .

I'm not sure of the difference between the two and haven't used them myself. Hope this is what you're looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM