简体   繁体   中英

Having difficulty obtaining subset in R

I'm trying to subset a data set with the following requirements:

  1. ethnicity is xyz
  2. education is Bachelor's Degree and above, ie Bachelor's Degree or Graduate Degree
  3. I then want to look at the income bracket of those who meet the above requirements. A bracket would be something like $30,000 - $39,999 , or $100,000 - $124,999 .
  4. Finally, as my final output, I want to see the subset obtained from the third item (above) with the column of whether or not those individuals are religious. In the data set, that corresponds to religious and not religious .

So it would look something like this

   income               religious
$30,000 - $39,999      not religious
$50,000 - $59,999         religious
  ....                    ....
  ....                    ....

Keeping mind those listed satisfy requirements 1 and 2.

Please bear in mind that I am new to programming. I've tried to figure this out for a very long time and have dug through many posts. I can't seem to get anything to work. How do I fix this? Someone please help.


So as to not take away from the clarity of the post, I'll post what I've tried below (but feel free to ignore it as it's probably rubbish).

I have tried many variations of the following just to get to step 3, but have failed miserably, and am about to bash my head with the keyboard:

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

I've also tried:

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

I believe income_sub gets me up to step 3, but I have no idea how to get it to step 4.

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Change my comment as it was a bit too long.

First your code, you are almost there; as income is a vector rather than a dataframe, you do not need the trailing comma. ie you can use

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

If you want to create a subsetted data, then do not include df$income at the start, just use df and keep the comma this time. This will subset your data, but keep all columns

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

To then look at the income levels for the subsetted data, you can use table

table(sub_df$income)

You can again use table to examine the counts of observations for each income by religious status.

table(sub_df$income, sub_df$religious)

If you just want to select the income and religious columns you can also do this using [

sub_df[c("religious", "income")]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM