[英]How to expand data from a big survey in R
我有一個包含這些變量的調查數據:
df <- data.frame(Sex = c("Male","Female","Male","Female","Male"),
Age = c(19,20,34,56,45),
ExpansionFactor = c(123456789,31256789,127896543,251436978,536294817))
我想創建一個報告,但首先我需要擴展數據調查而不會使我的 PC 崩潰。
我想要的數據集:
Sex Age
Male 19
. .
. .
. .
Female 20
. .
. .
. .
Male 34
. .
. .
. .
Female 56
. .
. .
. .
Male 45
. .
. .
. .
Male 45
dim(df)
[1] 1070341916 2
有什么建議?
非常感謝您的幫助。
我真的不明白你為什么需要那樣的數據。 您可以使用數據的加權匯總完美地創建報告,如下所示。
數據
library(ggplot2)
library(dplyr)
set.seed(123)
df <- data.frame(
sex = sample(c("Male", "Female"), size = 100, replace = TRUE),
age = rnorm(100, mean = 25, sd = 10),
expansion.factor = sample(12:40, size = 100, replace = TRUE)
)
您可以創建摘要
df %>%
group_by(sex) %>%
summarise(
count = sum(expansion.factor),
mean_age = (sum(age * expansion.factor))/sum(expansion.factor),
# There are packages with functions like this one
mean_age2 = weighted.mean(age, expansion.factor)
)
# A tibble: 2 x 4
sex count mean_age mean_age2
<fct> <int> <dbl> <dbl>
1 Female 1050 28.0 28.0
2 Male 1611 24.3 24.3
使用 ggplot2 進行可視化
df %>%
ggplot(aes(x = age, weight = expansion.factor)) +
geom_histogram(bins = 20)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.