简体   繁体   English

如何将 R dataframe 中的列子集中的数值更改为其他数值?

[英]How do I change numeric values in a subset of columns in a R dataframe to other numeric values?

Disclaimer: I am an R newbie and thus some information I provide might be redundant.免责声明:我是 R 新手,因此我提供的一些信息可能是多余的。 But after 2 hours of failed attempts at such a seemingly easy endeavour, I deemed it appropriate to ask a question in this forum.但是在如此看似简单的尝试中尝试了 2 小时失败后,我认为在这个论坛上提问是合适的。

So I have a datatset with currently 4 rows /subjects (more to come as this is ongoing research) and 259 variables /columns.所以我有一个数据集,目前有 4 行/主题(更多,因为这是正在进行的研究)和 259 个变量/列。 240 variables of this dataset are ratings of fit ("How well does the following adjective match the dimension X?" and 19 variables are sociodemographic.该数据集的 240 个变量是拟合评级(“以下形容词与维度 X 的匹配程度如何?”和 19 个变量是社会人口学的。

For these 240 rating-variables, my subjects could give a rating ranging from 1 ("fits very badly") to 7 ("fits very well").对于这 240 个评分变量,我的受试者可以给出从 1(“非常不适合”)到 7(“非常适合”)的评分。 Consequently, I have a 240 variables numbered from 1 to 7. I would like to change these numeric values as follows (the procedure being the same for all of the 240 colums)因此,我有一个从 1 到 7 编号的 240 个变量。我想按如下方式更改这些数值(所有 240 列的过程都相同)

1 should change to 0, 2 should change to 1/6, 3 should change to 2/6, 4 should change to 3/6, 5 should change to 4/6, 6 should change to 5/6 and 7 should change to 1. So no matter where in the 240 columns, a 1 should change to 0 and so on. 1 应更改为 0,2 应更改为 1/6,3 应更改为 2/6,4 应更改为 3/6,5 应更改为 4/6,6 应更改为 5/6,7 应更改为1. 所以无论在 240 列中的哪个位置,一个 1 都应该变为 0,以此类推。

I have tried the following approaches:我尝试了以下方法:

Recode numeric values in R 重新编码 R 中的数值

In this post, it says that在这篇文章中,它说

x <- 1:10

# With recode function using backquotes as arguments
dplyr::recode(x, `2` = 20L, `4` = 40L)
# [1]  1 20  3 40  5  6  7  8  9 10

# With case_when function
dplyr::case_when(
  x %in% 2 ~ 20,
  x %in% 4 ~ 40,
  TRUE ~ as.numeric(x)
)
#  [1]  1 20  3 40  5  6  7  8  9 10

Consequently, I tried this:因此,我尝试了这个:

df = ds %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                      AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20)
                   %>% recode(.,`1`=0,`2`=-1/6,`3`=-2/6, `4`=3/6,`5`=4/6, `6`=5/6, `7`=1))

with AD01_01 etc. being the column names for the adjectives my subjects should rate. AD01_01 等是我的受试者应该评价的形容词的列名。 I also tried it without the ".," after recode(, to no avail.在重新编码(,无济于事)之后,我也尝试过没有“。”。

This code is flawed because it omits the 19 rows of sociodemographic data I want to keep in my dataset.这段代码有缺陷,因为它遗漏了我想保存在数据集中的 19 行社会人口数据。 Moreover, I get the error "unexpected SPECIAL in " %>%". I thought R might accept my selected columns with the pipe operator as the "x" in the recode function. Apparently, this is not the case. I also tried to read up on the R documentation of the recode function but it made things much more confusing for me, as there were a lot of technical terms I don't understand.此外,我会收到“%>%”的错误。我以为Ze1e1d3d40573127E9EE0480480CAF1283D6Z可能会接受我所选的列,并使用Z20826A3CB51D6C7D9C219C219C7F4BF4BF4BF4E5C9199393939393939393636.ROUCE19999993939939 XC.阅读重新编码 function 的 R 文档,但这让我更加困惑,因为有很多我不明白的技术术语。

As there is another option mentioned in the post, I also tried this:由于帖子中提到了另一个选项,我也尝试了这个:

df = df %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                     AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20) %>% case_when (.,%in% 1~0,%in% 2~1/6,%in%3~2/6,%in%4~3/6,%in%5~4/6,%in%6~5/6,%in%7~1)

I thought I could give the output of the select function to the case_when function.我想我可以把 select function 的 output 给 case_when ZC1C42145268E617A474D。 Apparently, this is also not the case.显然,情况也并非如此。

When I execute this command, I get当我执行这个命令时,我得到

Error: unexpected SPECIAL in:
"df = df %>% select(AD01_01:AD01_20,AD02_01:AD02_20,AD03_01:AD03_20,AD04_01:AD04_20,AD05_01:AD05_20,AD06_01:AD06_20,                      AD09_01:AD09_20,AD10_01:AD10_20,AD11_01:AD11_20,AD12_01:AD12_20,AD13_01:AD13_20,AD14_01:AD14_20) %>% case_when (%in%"

Reading up on other possibilities, I found this阅读其他可能性,我发现了这个

https://rstudio-education.github.io/hopr/modify.html https://rstudio-education.github.io/hopr/modify.html

exemplary dataset:示例数据集:

head(dplyr::storms)头(dplyr::storms)

## # A tibble: 6 x 13
##   name   year month   day  hour   lat  long status category  wind pressure
##   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>  <ord>    <int>    <int>
## 1 Amy    1975     6    27     0  27.5 -79   tropi… -1          25     1013
## 2 Amy    1975     6    27     6  28.5 -79   tropi… -1          25     1013
## 3 Amy    1975     6    27    12  29.5 -79   tropi… -1          25     1013
## 4 Amy    1975     6    27    18  30.5 -79   tropi… -1          25     1013
## 5 Amy    1975     6    28     0  31.5 -78.8 tropi… -1          25     1012
## 6 Amy    1975     6    28     6  32.4 -78.7 tropi… -1          25     1012
## # ... with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>

We decide that we want to recode all NAs to 9999.

storm <- storms

storm$ts_diameter[is.na(storm$ts_diameter)] <- 9999
summary(storm$ts_diameter)

ds$AD01_01:AD01_20[1(ds$AD01_01:AD01_20)] <- 0, ds$AD01_01:AD01_20[2(ds$AD01_01:AD01_20)] <- 1/6, ds$AD01_01:AD01_20[3(ds$AD01_01:AD01_20)] <- 2/6, 
ds$AD01_01:AD01_20[4(ds$AD01_01:AD01_20)] <- 3/6, ds$AD01_01:AD01_20[5(ds$AD01_01:AD01_20)] <- 4/6, ds$AD01_01:AD01_20[6(ds$AD01_01:AD01_20)] <- 5/6, 
ds$AD01_01:AD01_20[7(ds$AD01_01:AD01_20)] <- 1

My idea in this case was to use the "assign"-Function for multiple columns at a time (this effort just concerns 20 of my 240 columns and it also didn't work. I got the error "could not find function ":<-" which is weird because I thought this was a basic command. The only noteworthy thing that might explain is that I executed "library(readr) and library(tidyverse)" beforehand.在这种情况下,我的想法是一次对多列使用“分配”功能(这项工作只涉及我的 240 列中的 20 列,而且它也不起作用。我收到错误“找不到 function”:< -”这很奇怪,因为我认为这是一个基本命令。唯一值得注意的可能是我事先执行了“library(readr) and library(tidyverse)”。

After 2 hours, I finally give up. 2个小时后,我终于放弃了。 I would appreciate it if you found the time to help me.如果您有时间帮助我,我将不胜感激。 I would also like to know where I went wrong and why my code doesn't work (or alternatively please explain why your code works).我还想知道我哪里出错了,为什么我的代码不起作用(或者请解释为什么你的代码起作用)。

How about using mutate(across()) ?如何使用mutate(across()) For example, if all your "adjective rating" columns start with "AD", you can do something like this:例如,如果您所有的“形容词评分”列都以“AD”开头,您可以执行以下操作:

library(dplyr)
ds %>% mutate(across(starts_with("AD"), ~(.x-1)/6))

Explanation of where you went wrong with your code:解释你的代码哪里出错了:

First, your select(...) %>% recode(...) was close.首先,您的select(...) %>% recode(...)很接近。 However, when you use select , you are reducing ds to only the selected columns, thus recoding those values and assigning to df will result in df not having the demographic variables.但是,当您使用select时,您将ds减少到仅选定的列,因此重新编码这些值并分配给df将导致df没有人口统计变量。

Second, if you want to use recode you can, but you can't feed it an entire data frame/tibble, like you are doing when you pipe ( %>% ) the selected columns to it.其次,如果你想使用recode ,你可以,但你不能像你在 pipe ( %>% ) 选择的列给它时那样提供整个数据框/小标题。 Instead, you can use recode() iteratively in .fns , on each of the columns in the .cols param of across() , like this:相反,您可以在.fns中迭代地使用recode() ,在 cross across() ) 的.cols参数中的每一列上,如下所示:

ds %>%
  mutate(across(
    .cols = starts_with("AD"),
    .fns = ~recode(.x,`1`=0,`2`=-1/6,`3`=-2/6, `4`=3/6,`5`=4/6, `6`=5/6, `7`=1))
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM