I'm trying to subset a data set with the following requirements:
ethnicity
is xyz
education
is Bachelor's Degree and above, ie Bachelor's Degree
or Graduate Degree
$30,000 - $39,999
, or $100,000 - $124,999
. religious
and not religious
. So it would look something like this
income religious
$30,000 - $39,999 not religious
$50,000 - $59,999 religious
.... ....
.... ....
Keeping mind those listed satisfy requirements 1 and 2.
Please bear in mind that I am new to programming. I've tried to figure this out for a very long time and have dug through many posts. I can't seem to get anything to work. How do I fix this? Someone please help.
So as to not take away from the clarity of the post, I'll post what I've tried below (but feel free to ignore it as it's probably rubbish).
I have tried many variations of the following just to get to step 3, but have failed miserably, and am about to bash my head with the keyboard:
df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]
I've also tried:
race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]
I believe income_sub
gets me up to step 3, but I have no idea how to get it to step 4.
library(dplyr)
df %>%
filter(ethnicity == "xyz" &
education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
group_by(religious) %>%
summarize(lower_bound = min(income),
upper_bound = max(income) )
Change my comment as it was a bit too long.
First your code, you are almost there; as income is a vector rather than a dataframe, you do not need the trailing comma. ie you can use
df$income[which(df$ethnicity == "xyz" &
df$education %in% c("Bachelor's Degree", "Graduate Degree") ]
# note no comma after the closing bracket
If you want to create a subsetted data, then do not include df$income
at the start, just use df
and keep the comma this time. This will subset your data, but keep all columns
sub_df <- df[which(df$ethnicity == "xyz" &
df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]
To then look at the income
levels for the subsetted data, you can use table
table(sub_df$income)
You can again use table
to examine the counts of observations for each income
by religious
status.
table(sub_df$income, sub_df$religious)
If you just want to select the income
and religious
columns you can also do this using [
sub_df[c("religious", "income")]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.