简体   繁体   中英

How do I change numeric values in a subset of columns in a R dataframe to other numeric values?

Disclaimer: I am an R newbie and thus some information I provide might be redundant. But after 2 hours of failed attempts at such a seemingly easy endeavour, I deemed it appropriate to ask a question in this forum.

So I have a datatset with currently 4 rows /subjects (more to come as this is ongoing research) and 259 variables /columns. 240 variables of this dataset are ratings of fit ("How well does the following adjective match the dimension X?" and 19 variables are sociodemographic.

For these 240 rating-variables, my subjects could give a rating ranging from 1 ("fits very badly") to 7 ("fits very well"). Consequently, I have a 240 variables numbered from 1 to 7. I would like to change these numeric values as follows (the procedure being the same for all of the 240 colums)

1 should change to 0, 2 should change to 1/6, 3 should change to 2/6, 4 should change to 3/6, 5 should change to 4/6, 6 should change to 5/6 and 7 should change to 1. So no matter where in the 240 columns, a 1 should change to 0 and so on.

I have tried the following approaches:

Recode numeric values in R

In this post, it says that

x <- 1:10

# With recode function using backquotes as arguments
dplyr::recode(x, `2` = 20L, `4` = 40L)
# [1]  1 20  3 40  5  6  7  8  9 10

# With case_when function
dplyr::case_when(
  x %in% 2 ~ 20,
  x %in% 4 ~ 40,
  TRUE ~ as.numeric(x)
)
#  [1]  1 20  3 40  5  6  7  8  9 10

Consequently, I tried this:

df = ds %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                      AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20)
                   %>% recode(.,`1`=0,`2`=-1/6,`3`=-2/6, `4`=3/6,`5`=4/6, `6`=5/6, `7`=1))

with AD01_01 etc. being the column names for the adjectives my subjects should rate. I also tried it without the ".," after recode(, to no avail.

This code is flawed because it omits the 19 rows of sociodemographic data I want to keep in my dataset. Moreover, I get the error "unexpected SPECIAL in " %>%". I thought R might accept my selected columns with the pipe operator as the "x" in the recode function. Apparently, this is not the case. I also tried to read up on the R documentation of the recode function but it made things much more confusing for me, as there were a lot of technical terms I don't understand.

As there is another option mentioned in the post, I also tried this:

df = df %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                     AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20) %>% case_when (.,%in% 1~0,%in% 2~1/6,%in%3~2/6,%in%4~3/6,%in%5~4/6,%in%6~5/6,%in%7~1)

I thought I could give the output of the select function to the case_when function. Apparently, this is also not the case.

When I execute this command, I get

Error: unexpected SPECIAL in:
"df = df %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                      AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20) %>% case_when (%in%"

Reading up on other possibilities, I found this

https://rstudio-education.github.io/hopr/modify.html

exemplary dataset:

head(dplyr::storms)

## # A tibble: 6 x 13
##   name   year month   day  hour   lat  long status category  wind pressure
##   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>  <ord>    <int>    <int>
## 1 Amy    1975     6    27     0  27.5 -79   tropi… -1          25     1013
## 2 Amy    1975     6    27     6  28.5 -79   tropi… -1          25     1013
## 3 Amy    1975     6    27    12  29.5 -79   tropi… -1          25     1013
## 4 Amy    1975     6    27    18  30.5 -79   tropi… -1          25     1013
## 5 Amy    1975     6    28     0  31.5 -78.8 tropi… -1          25     1012
## 6 Amy    1975     6    28     6  32.4 -78.7 tropi… -1          25     1012
## # ... with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>

We decide that we want to recode all NAs to 9999.

storm <- storms

storm$ts_diameter[is.na(storm$ts_diameter)] <- 9999
summary(storm$ts_diameter)

ds$AD01_01:AD01_20[1(ds$AD01_01:AD01_20)] <- 0, ds$AD01_01:AD01_20[2(ds$AD01_01:AD01_20)] <- 1/6, ds$AD01_01:AD01_20[3(ds$AD01_01:AD01_20)] <- 2/6, 
ds$AD01_01:AD01_20[4(ds$AD01_01:AD01_20)] <- 3/6, ds$AD01_01:AD01_20[5(ds$AD01_01:AD01_20)] <- 4/6, ds$AD01_01:AD01_20[6(ds$AD01_01:AD01_20)] <- 5/6, 
ds$AD01_01:AD01_20[7(ds$AD01_01:AD01_20)] <- 1

My idea in this case was to use the "assign"-Function for multiple columns at a time (this effort just concerns 20 of my 240 columns and it also didn't work. I got the error "could not find function ":<-" which is weird because I thought this was a basic command. The only noteworthy thing that might explain is that I executed "library(readr) and library(tidyverse)" beforehand.

After 2 hours, I finally give up. I would appreciate it if you found the time to help me. I would also like to know where I went wrong and why my code doesn't work (or alternatively please explain why your code works).

How about using mutate(across()) ? For example, if all your "adjective rating" columns start with "AD", you can do something like this:

library(dplyr)
ds %>% mutate(across(starts_with("AD"), ~(.x-1)/6))

Explanation of where you went wrong with your code:

First, your select(...) %>% recode(...) was close. However, when you use select , you are reducing ds to only the selected columns, thus recoding those values and assigning to df will result in df not having the demographic variables.

Second, if you want to use recode you can, but you can't feed it an entire data frame/tibble, like you are doing when you pipe ( %>% ) the selected columns to it. Instead, you can use recode() iteratively in .fns , on each of the columns in the .cols param of across() , like this:

ds %>%
  mutate(across(
    .cols = starts_with("AD"),
    .fns = ~recode(.x,`1`=0,`2`=-1/6,`3`=-2/6, `4`=3/6,`5`=4/6, `6`=5/6, `7`=1))
  )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM