簡體   English   中英

如何根據另一個 dataframe 中的數據計算一個 dataframe 中的變量比例

[英]How to compute proportion of variable in one dataframe based on data in another dataframe

我有兩個數據框:

df1 <- data.frame(popPerGroup = c(NA, 4153813, 4753258, 6716294, 7079839, 8218844, 8476699, 6453706, 3560646, 1623932, 227805),
                  ageGroup = c("NA","0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99"))

df2 <- data.frame(ageGroup = c("30", "40", "50", "60", "70", "80", "90"),
                  nCases = c(1,2,7,12, 21,25,7))

我想從 df1 中匹配的df1 df1$popPerGroup值中計算df2$nCases每個df2$ageGroup的比例。 首先,我通過使用sub的正則表達式操作來解決不同ageGroup值的問題:

df1$ageGroupNew <- sub("^(\\d+)-\\d+$", "\\1", df1$ageGroup, perl = T)

這工作正常。 但是下一個使用match的操作(部分)失敗了,因為更高的ageGroup的比例被給出為NA

df2$populProp <- df2$nCases[match(df2$ageGroup, df1$ageGroupNew)] / df1$popPerGroup[match(df2$ageGroup, df1$ageGroupNew)]
df2
  ageGroup nCases    populProp
1       30      1 2.966169e-06
2       40      2 3.041790e-06
3       50      7 8.257932e-07
4       60     12           NA
5       70     21           NA
6       80     25           NA
7       90      7           NA

如何修改代碼以便正確計算所有比例?

編輯:

解決方案其實很簡單:

df2$populProp <- df2$nCases / df1$popPerGroup[match(df2$ageGroup, df1$ageGroupNew)]

您似乎沒有將正確的分子與正確的分母匹配(您的match調用為您提供df1中的索引,但您在df2中為您提供子集):

df1 <- transform(df1, ageGroupNew = sub("^(\\d+)-\\d+$", "\\1", ageGroup, 
                                        perl = T))
df2[match(df2$ageGroup, df1$ageGroupNew), ]
#R>      ageGroup nCases
#R> 5          70     21
#R> 6          80     25
#R> 7          90      7
#R> NA       <NA>     NA
#R> NA.1     <NA>     NA
#R> NA.2     <NA>     NA
#R> NA.3     <NA>     NA
df1[match(df2$ageGroup, df1$ageGroupNew), ]
#R>    popPerGroup ageGroup ageGroupNew
#R> 5      7079839    30-39          30
#R> 6      8218844    40-49          40
#R> 7      8476699    50-59          50
#R> 8      6453706    60-69          60
#R> 9      3560646    70-79          70
#R> 10     1623932    80-89          80
#R> 11      227805    90-99          90

也許這會做?

# create ageGroupNew
df1 <- transform(df1, ageGroupNew = sub("^(\\d+)-\\d+$", "\\1", ageGroup, 
                                        perl = T))

# merge into one table and do the calculation
df_merge <- merge(df1, df2, by.x = "ageGroupNew", by.y = "ageGroup")
df_merge <- transform(df_merge, populProp = nCases / popPerGroup)
df_merge[, c("ageGroup", "nCases", "populProp")]
#R>   ageGroup nCases    populProp
#R> 1    30-39      1 1.412461e-07
#R> 2    40-49      2 2.433432e-07
#R> 3    50-59      7 8.257932e-07
#R> 4    60-69     12 1.859397e-06
#R> 5    70-79     21 5.897806e-06
#R> 6    80-89     25 1.539473e-05
#R> 7    90-99      7 3.072803e-05

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM