![](/img/trans.png)
[英]Categorize rows in dataframe (with string variable) based on string data in another dataframe
[英]How to compute proportion of variable in one dataframe based on data in another dataframe
我有兩個數據框:
df1 <- data.frame(popPerGroup = c(NA, 4153813, 4753258, 6716294, 7079839, 8218844, 8476699, 6453706, 3560646, 1623932, 227805),
ageGroup = c("NA","0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99"))
df2 <- data.frame(ageGroup = c("30", "40", "50", "60", "70", "80", "90"),
nCases = c(1,2,7,12, 21,25,7))
我想從 df1 中匹配的df1
df1$popPerGroup
值中計算df2$nCases
每個df2$ageGroup
的比例。 首先,我通過使用sub
的正則表達式操作來解決不同ageGroup
值的問題:
df1$ageGroupNew <- sub("^(\\d+)-\\d+$", "\\1", df1$ageGroup, perl = T)
這工作正常。 但是下一個使用match
的操作(部分)失敗了,因為更高的ageGroup
的比例被給出為NA
:
df2$populProp <- df2$nCases[match(df2$ageGroup, df1$ageGroupNew)] / df1$popPerGroup[match(df2$ageGroup, df1$ageGroupNew)]
df2
ageGroup nCases populProp
1 30 1 2.966169e-06
2 40 2 3.041790e-06
3 50 7 8.257932e-07
4 60 12 NA
5 70 21 NA
6 80 25 NA
7 90 7 NA
如何修改代碼以便正確計算所有比例?
編輯:
解決方案其實很簡單:
df2$populProp <- df2$nCases / df1$popPerGroup[match(df2$ageGroup, df1$ageGroupNew)]
您似乎沒有將正確的分子與正確的分母匹配(您的match
調用為您提供df1
中的索引,但您在df2
中為您提供子集):
df1 <- transform(df1, ageGroupNew = sub("^(\\d+)-\\d+$", "\\1", ageGroup,
perl = T))
df2[match(df2$ageGroup, df1$ageGroupNew), ]
#R> ageGroup nCases
#R> 5 70 21
#R> 6 80 25
#R> 7 90 7
#R> NA <NA> NA
#R> NA.1 <NA> NA
#R> NA.2 <NA> NA
#R> NA.3 <NA> NA
df1[match(df2$ageGroup, df1$ageGroupNew), ]
#R> popPerGroup ageGroup ageGroupNew
#R> 5 7079839 30-39 30
#R> 6 8218844 40-49 40
#R> 7 8476699 50-59 50
#R> 8 6453706 60-69 60
#R> 9 3560646 70-79 70
#R> 10 1623932 80-89 80
#R> 11 227805 90-99 90
也許這會做?
# create ageGroupNew
df1 <- transform(df1, ageGroupNew = sub("^(\\d+)-\\d+$", "\\1", ageGroup,
perl = T))
# merge into one table and do the calculation
df_merge <- merge(df1, df2, by.x = "ageGroupNew", by.y = "ageGroup")
df_merge <- transform(df_merge, populProp = nCases / popPerGroup)
df_merge[, c("ageGroup", "nCases", "populProp")]
#R> ageGroup nCases populProp
#R> 1 30-39 1 1.412461e-07
#R> 2 40-49 2 2.433432e-07
#R> 3 50-59 7 8.257932e-07
#R> 4 60-69 12 1.859397e-06
#R> 5 70-79 21 5.897806e-06
#R> 6 80-89 25 1.539473e-05
#R> 7 90-99 7 3.072803e-05
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.