简体繁体中英

Define sample size using simple random sampling

原文 2021-03-02 21:22:04 4 1 r/ dplyr/ sampling/ statistics-bootstrap/ statistical-test

I am trying to run a PCA, but I have too much data (20k observations) the resolution is too low. I am using sample_n(df, replace = TRUE, n) [from dplyr] to reduce the size and have a better fit.

My question is: what is the best technique to define (or estimate) the sample size (n)? If I have 20k observations (different sites, different times of the year, relatively well homogeneous), which cutoff should I use: 5%, 10%, 20%?

Could you give me a reference to your suggestion?

Thank you in advance for your comments.

1 answers

I would make a loop with different sample sizes, I dont believe there is a clear cut/off just you could do with train/test (although we have piplines, but you know what I mean the 70/30 cutoff). The only thing I would check is if sample_n is still not too clustered and values are relatively equally represented.

If you are firm with k-means clustering, there we have the "elbow method", which is a little bit subjective where is the best amount of clusters (although we measure RSS), you just have to try a lot of iterations and loops.

You know with neural networks when you have eg a million observations you can reduce test set to eg 5 or 10 % because in absolute values you still have a lot of cases.

In summary: I think that it needs a practical test like the elbow method in clustering. Becaue its can be very specific to your data.

I hope my answer is to at least to some value to you, I have no journal reference atm.

simple random sampling from groups with specified sample size

Is there a way to target an overall sample size when using stratified sampling in R?

Repeated random sampling and kurtosis on unbalanced sample

Random sampling with sample() gives unexpected results

Stratified Sampling in R- sample size issues

random sampling using slice_sample, but with constraints to have equal number of values in some columns

Efficient recursive random sampling with groups of unequal size

Comparing random samples using bootstrapping (to determine minimum sample size)

stratified sampling with group size below sample size in R

Random sample with probability to proportion to size

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question simple random sampling from groups with specified sample size Is there a way to target an overall sample size when using stratified sampling in R? Repeated random sampling and kurtosis on unbalanced sample Random sampling with sample() gives unexpected results Stratified Sampling in R- sample size issues random sampling using slice_sample, but with constraints to have equal number of values in some columns Efficient recursive random sampling with groups of unequal size Comparing random samples using bootstrapping (to determine minimum sample size) stratified sampling with group size below sample size in R Random sample with probability to proportion to size

Related Tags

Define sample size using simple random sampling

Question

1 answers

solution1 1 ACCPTED 2021-03-02 21:53:49

solution1
1 ACCPTED 2021-03-02 21:53:49