I have ~4000 observations in my data frame, test_11, and have pasted part of the data frame below:
The k_hidp column represents matching households, the k_fihhmnnet1_dv column is their reported household income and the percentage_income_rounded reports each participant's income contribution to the total household income
I want to filter my data to remove all k_hidp observations where their collective income in the percentage_income_rounded does not equal 100.
So for example, the first household 68632420 reported a contribution of 83% (65+13) instead of the 100% as the other households report.
Is there any way to remove these household observations so I am only left with households with a collective income of 100%?
Thank you!
Try this:
## Creating the dataframe
df=data.frame(k_hidp = c(68632420,68632420,68632420,68632420,68632420,68632420,68632422,68632422,68632422,68632422,68632428,68632428),
percentage_income_rounded = c(65,18,86,14,49,51,25,25,25,25,50,50))
## Loading the libraries
library(dplyr)
## Aggregating and determining which household collective income is 100%
df1 = df %>%
group_by(k_hidp) %>%
mutate(TotalPercentage = sum(percentage_income_rounded)) %>%
filter(TotalPercentage == 100)
> df1
# A tibble: 6 x 3
# Groups: k_hidp [2]
k_hidp percentage_income_rounded TotalPercentage
<dbl> <dbl> <dbl>
1 68632422 25 100
2 68632422 25 100
3 68632422 25 100
4 68632422 25 100
5 68632428 50 100
6 68632428 50 100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.