简体   繁体   中英

weighted median by_group()

I have a survey which includes a weight. I want to calculate the weighted median of costs of housing per status group (tenant, owner, subtenant), year and region(Ost_B). Ideally the value is then assigned to the right observation in an extra column of the dataframe. I used the packages dplyr and spatstat to achieve it.

My code works fine, when I use the weighted.mean function, however if I use the weighted.median function I get the following warning.

Error in mutate_impl(.data, dots) : 
  Evaluation error: All weights are zero.

This is how my data looks like:

       Ost_B         hgowner    bWohnk syear      hhrf
109457  West [2] Main Tenant        NA  2009 2249.8874
277839  West       [1] Owner  883.3333  2002 1104.3820
245712   Ost [2] Main Tenant  352.0000  1997  863.4752
122197   Ost       [1] Owner        NA  2005 2618.3637
188236  West [2] Main Tenant  231.0000  2008 1397.6606
195256  West       [1] Owner 1339.5833  1996 4940.1033
271210   Ost [2] Main Tenant  546.0000  2000 3711.1919
158995  West       [1] Owner  788.0833  2014 3916.8540
20414   West [2] Main Tenant  706.0000  2015  385.5527
251343  West [2] Main Tenant  521.0000  1993 1082.2479

This is the code I used so far:

library('ggplot2')
library('dplyr')
library('tidyr')
library('ggrepel')
library('spatstat')

data.stack <- data.stack %>% 
  group_by(syear, hgowner, Ost_B) %>% 
  mutate(mean_Wohnk=weighted.mean(bWohnk, hhrf, na.rm=TRUE)) ##works fine

data.stack <- data.stack %>% 
  group_by(syear, hgowner, Ost_B) %>% 
  mutate(med_Wohnk=weighted.median(bWohnk, hhrf, na.rm=TRUE)) ##doesn't work

I suppose it has something to do with how group_by an the weighted.median function work together. However, I can't figure out how to get it to work.

So help would be much appreciated, I would also use another package but had the same problem with wtd.median() from the Hmisc package

Later on I need to perform the same thing for deciles.

Thank your very much for your help!

Here is a sample of my data:

structure(list(Ost_B = c("West", "West", "West", "West", "West", 
"West", "West", "West", "West", "West", "West", "Ost", "Ost", 
"West", "West", "Ost", "West", "West", "Ost", "Ost", "West", 
"West", "West", "West", "Ost", "West", "West", "Ost", "West", 
"West", "West", "Ost", "West", "West", "West", "West", "West", 
"West", "Ost", "Ost", "West", "West", "West", "Ost", "West", 
"West", "West", "West", "West", "West", "West", "West", "West", 
"West", "West", "West", "Ost", "West", "West", "West", "Ost", 
"Ost", "West", "West", "West", "West", "Ost", "West", "West", 
"West", "West", "Ost", "West", "West", "West", NA, "Ost", "West", 
"Ost", "West", "West", "Ost", "West", "West", "West", "West", 
"Ost", "West", "West", "West", "West", "West", "West", "West", 
NA, "West", "West", "West", "West", "West"), hgowner = structure(c(8L, 
9L, 9L, 9L, 9L, 9L, 9L, 8L, 9L, 9L, 9L, 9L, 8L, 9L, 8L, 9L, 9L, 
8L, 9L, 9L, 8L, 9L, 9L, 8L, 9L, 8L, 8L, 9L, 8L, 9L, 8L, 9L, 8L, 
9L, 9L, 9L, 9L, 8L, 9L, 9L, 8L, 9L, 9L, 8L, 9L, 9L, 8L, 8L, 9L, 
9L, 9L, 8L, 9L, 8L, 10L, 8L, 10L, 9L, 9L, 9L, 8L, 8L, 9L, 9L, 
9L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L, 8L, 9L, 9L, 9L, 8L, 9L, 
9L, 9L, 9L, 9L, 8L, 9L, 9L, 9L, 8L, 8L, 9L, 9L, 8L, 8L, 8L, 9L, 
8L, 8L, 9L, 8L), .Label = c("[-8] Question this year not part of Survey program", 
"[-6] Questionnaire Version With Modified Filter", "[-5] Not Included In Questionnaire Version", 
"[-4] Forbidden Multiple Response", "[-3] Not Valid", "[-2] Does Not Apply", 
"[-1] No Answer", "[1] Owner", "[2] Main Tenant", "[3] Sub-Tenant", 
"[4] Tenant", "[5] Living In A Home"), class = "factor"), bWohnk = c(741.666666666667, 
536, 675, NA, 316, 348, 301, 191.75, 723, 1035, 990, 41, 72.5, 
640, 200, 351, 613, 666.666666666667, 280, 280, 1486.91666666667, 
496, 367, NA, 756, 221.666666666667, NA, 407, 345.583333333333, 
229, 584, 266, 1074.08333333333, 742, 62, 570, 564, 250, 306, 
392, 3251.66666666667, NA, 560, 1041.66666666667, 500, 580, 349.166666666667, 
1250, 383, 470, 620, NA, 510, 711.666666666667, NA, 400, NA, 
521, 350, 385, 333.916666666667, 1637.5, 229, 780, 467, 844.083333333333, 
NA, 199.166666666667, 262.916666666667, 249, 345.583333333333, 
343, 255, 306, 860, NA, 360, 728, 81, 670, 590, 345, NA, 147, 
NA, 800, 390, 409, 2279.16666666667, 858.833333333333, 252, 240, 
129.166666666667, 1158.33333333333, NA, 370, 247.083333333333, 
225, 350, 2048.33333333333), syear = c(2013L, 2000L, 2010L, 1991L, 
1990L, 1999L, 1992L, 1996L, 2006L, 2013L, 2012L, 1991L, 2001L, 
2014L, 2009L, 2008L, 1998L, 2007L, 2004L, 2010L, 2009L, 2013L, 
2000L, 2004L, 1998L, 2005L, 2015L, 2006L, 2007L, 1988L, 1996L, 
2015L, 2000L, 2013L, 2006L, 2001L, 2003L, 2008L, 1995L, 2015L, 
2004L, 1985L, 2011L, 2007L, 2009L, 2008L, 2007L, 2007L, 1996L, 
2015L, 2014L, 2001L, 2013L, 2008L, 2008L, 2007L, 1998L, 1997L, 
2012L, 2009L, 1993L, 2010L, 1999L, 2010L, 2015L, 1991L, 2001L, 
2004L, 1990L, 2000L, 1986L, 2012L, 2009L, 1991L, 2013L, 1984L, 
2005L, 1991L, 1993L, 2010L, 2013L, 2012L, 1985L, 1988L, 2000L, 
2013L, 2005L, 1999L, 2004L, 1994L, 1991L, 2000L, 2012L, 2002L, 
1984L, 2006L, 1989L, 2004L, 2005L, 2009L), hhrf = c(921.252509823361, 
2532.30483182391, 1087.25498433646, 5360.12838117604, 4452.44496134467, 
7171.86195726776, 6954.99482158864, 5293.69413376784, 954.587233459912, 
770.057906194315, 890.961226274402, 1722.1699042344, 3461.79376227083, 
581.158893623584, 664.806040953876, 675.625047925726, 1574.07382275557, 
3704.07702821766, 2314.05879164323, 9140.05504547961, 1005.97073867107, 
681.798909716727, 3677.30043072148, 3520.56757967251, 2204.70262296716, 
2713.90174652616, 2590.78246030509, 1596.21626704554, 1121.92427792032, 
6002.35785636737, 2767.4864747415, 344.526432577656, 2086.31596835848, 
4291.12236938177, 6949.80424056846, 1863.21490205662, 2931.02198881285, 
3053.75478333405, 4740.24457203908, 3108.19722193157, 848.300965349622, 
6782.34863226538, 5728.42824611808, 529.221471230179, 0, 3894.38448724548, 
6191.35484028036, 5087.22552915895, 10586.6710234059, 310.171496020192, 
395.575830034726, 3119.21033893799, 760.153320489404, 3242.17431586629, 
938.995243924511, 1516.05334458071, 795.550785734506, 4310.61532022633, 
449.076675395395, 2379.3439598687, 6342.0726785268, 0, 10800.2931706199, 
10001.6994303796, 868.391937643362, 4053.41583759845, 2128.14552166684, 
1992.9512907382, 5749.10182602249, 384.310560302864, 4499.40446551783, 
3552.06832694212, 569.031301845848, 2192.44389667968, 142.161789518523, 
5093.0220545886, 1890.93497387136, 2484.45841921283, 4870.87928765949, 
0, 487.614412884924, 2649.97192542962, 736.845961792243, 894.230153879632, 
4765.82775377142, 562.849351807804, 2365.25925889019, 2378.82322189005, 
1087.29238473588, 6727.53539570428, 22616.3513598868, 1996.36668869512, 
4727.07641240618, 992.560599469024, 5716.97578413211, 1774.61201799785, 
5633.78476350193, 6772.43186294665, 3854.55479513783, 2691.89802789182
)), row.names = c(37169L, 72350L, 37533L, 269481L, 170828L, 38185L, 
290578L, 85410L, 275082L, 52371L, 42623L, 229396L, 206395L, 65165L, 
76781L, 13795L, 198625L, 103913L, 225237L, 171747L, 299594L, 
38141L, 120416L, 272962L, 227829L, 91750L, 48097L, 218908L, 266404L, 
157923L, 195375L, 44572L, 192725L, 150943L, 267952L, 263899L, 
104001L, 91037L, 230106L, 135626L, 118682L, 75336L, 220588L, 
221938L, 292125L, 103957L, 87702L, 281763L, 255340L, 115224L, 
5651L, 114331L, 26278L, 78108L, 245838L, 90290L, 246453L, 265473L, 
52804L, 180766L, 229731L, 292792L, 32416L, 111111L, 7194L, 28970L, 
142415L, 265198L, 271291L, 1989L, 77440L, 131535L, 22310L, 177106L, 
59567L, 11177L, 226138L, 177877L, 206061L, 294689L, 16921L, 151844L, 
195428L, 237935L, 87825L, 285575L, 131939L, 267954L, 280806L, 
145326L, 193033L, 96553L, 149493L, 77576L, 174725L, 101594L, 
193089L, 116896L, 87604L, 156306L), class = "data.frame")

If you test for the condition that is being found in one or more groups and return an NA instead, then this code succeeds:

# Apparently `spatstat::weighted.median` needs at least 3 points or it fails

   > dat3 <- dat %>% 
+   group_by(syear, hgowner, Ost_B) %>% 
+   mutate(med_Wohnk=if( sum(!is.na(hhrf))<2 ){NA} else { weighted.median(bWohnk, hhrf, na.rm=TRUE)})
Error in mutate_impl(.data, dots) : 
  Evaluation error: All weights are zero.

dat3 <- dat %>% 
   group_by(syear, hgowner, Ost_B) %>% 
   mutate(med_Wohnk=if( sum(!is.na(hhrf))<3 ){NA} else { weighted.median(bWohnk, hhrf, na.rm=TRUE)})

dat3
# A tibble: 100 x 6
# Groups:   syear, hgowner, Ost_B [57]
   Ost_B hgowner         bWohnk syear  hhrf med_Wohnk
   <chr> <fct>            <dbl> <int> <dbl>      <dbl>
 1 West  [1] Owner         742.  2013  921.        NA 
 2 West  [2] Main Tenant   536   2000 2532.       310.
 3 West  [2] Main Tenant   675   2010 1087.       722.
 4 West  [2] Main Tenant    NA   1991 5360.       252 
 5 West  [2] Main Tenant   316   1990 4452.        NA 
 6 West  [2] Main Tenant   348   1999 7172.       229 
 7 West  [2] Main Tenant   301   1992 6955.        NA 
 8 West  [1] Owner         192.  1996 5294.        NA 
 9 West  [2] Main Tenant   723   2006  955.        62 
10 West  [2] Main Tenant  1035   2013  770.       655.
# ... with 90 more rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM