简体   繁体   中英

Drop a Categorical Level <- Dplyr <- Statistics With R

Using the dplyr package in R, I'm trying to make a categorical variable from 3 levels to only 2. I'm using the famous iris data set and trying to turn the class variable (containing: "Iris-versicolor", "Iris-setosa", & "Iris-virginica") into one with only two levels (containing: "Iris-versicolor", "Iris-setosa"). So, I want to create a new data set with I've come up with this:

IRIS_TEST2 <- IRIS_TEST %>%
   filter(class != "Iris-virginica")

So, when I try to run a hypothesis test on it:

inference(y = sepal_length, x = class, data = IRIS_TEST2, statistic = "mean", type = 
      "ci", method = "theoretical", conf_level = .95)

I continue to get an error:

Error: Categorical variable has more than 2 levels, confidence interval is undefined,
         use ANOVA to test for a difference between means

Alternatively, I could use a way to append the "x =" to include only "Iris-versicolor" & "Iris-setosa"

inference(y = sepal_length, x = class, data = IRIS_TEST2, statistic = "mean", type = 
        "ci", method = "theoretical", conf_level = .95)

Any help would be greatly appreciated!

After filtering out the class I did not want (and storing it into a new variable), I was able to run this code:

IRIS_TEST2$class <- factor(IRIS_TEST2$class)

This allowed me to only have two levels, and I was able to run my hypothesis test and find the confidence interval

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM