简体   繁体   English

如何从一个变量中创建两个新变量,并在 R 中为其附加虚拟值?

[英]How do I create two new variables out of one variable, and attach dummy values to it in R?

I am completely new to any kind of coding, nevermind R in particular, so my days of googling have not been very helpful.我对任何类型的编码都是全新的,尤其是 R,所以我在谷歌上搜索的日子并不是很有帮助。 I would really appreciate any kind of help/insights!我真的很感激任何帮助/见解!

I would like to know how to get two new variables out of the original variable, and attach new values to it - basically I start with this:我想知道如何从原始变量中获取两个新变量,并为其附加新值 - 基本上我从这个开始:

初始点

and want to obtain this:并想得到这个:

期望的结果

I managed to get it in long format with melt(dataname, id.vars=c("ID")) and the ID & value I get are good.我设法用melt(dataname, id.vars=c("ID"))得到它的长格式,我得到的 ID 和值很好。 But there is only one variable with my four headers (loudHot, quietHot, loudCold, quietCold) repeated - how do I create two new variables out of this and assign the values to it (eg that "Volume" has the value 1 when the original variable is loudHot or loudCold and 0 if its quietHot or quietCold, and then "Temp" is 1 when the original variable is loudHot or quietHot and 0 when its loudCold or quietCold)?但是只有一个变量,我的四个标题(loudHot、quietHot、loudCold、quietCold)重复了 - 我如何从中创建两个新变量并为其分配值(例如,“Volume”的值是 1 当原始变量是 loudHot 或 loudCold,如果是 quietHot 或 quietCold,则为 0,然后当原始变量为 loudHot 或 quietHot 时,“Temp”为 1,当它的 loudCold 或 quietCold 时为 0)?

I wouldn't be too hard on yourself - this isn't really trivial.我不会对自己太苛刻 - 这并不是微不足道的。 Anyway, you can use pivot_longer from tidyr and some data manipulation with dplyr to achieve your desired outcome:无论如何,您可以使用pivot_longer中的tidyrdplyr进行一些数据操作来实现您想要的结果:

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(-ID) %>%
  mutate(Volume = as.numeric(grepl("loud", name)),
         Temp   = as.numeric(grepl("Hot",  name))) %>%
  select(ID, Volume, Temp, value)
#> # A tibble: 32 x 4
#>       ID Volume  Temp value
#>    <dbl>  <dbl> <dbl> <dbl>
#>  1     2      1     1    14
#>  2     2      0     1    16
#>  3     2      1     0    16
#>  4     2      0     0    15
#>  5     4      1     1    19
#>  6     4      0     1    15
#>  7     4      1     0    10
#>  8     4      0     0     8
#>  9     6      1     1    11
#> 10     6      0     1    17
#> # ... with 22 more rows

Data数据

df <- data.frame(ID        = (1:8) * 2,
                 loudHot   = c(14, 19, 11, 20, 18, 17, 16, 2),
                 quietHot  = c(16, 15, 17, 5, 10, 10, 15, 0),
                 loudCold  = c(16, 10, 10, 4, 3, 2, 14, 2),
                 quietCold = c(15, 8, 17, 8 ,10, 12, 5, 0))

As a tip for any future SO questions, please don't post images of data.作为任何未来 SO 问题的提示,请不要发布数据图像 Folks here need to be able to cut and paste the text of your data to test and verify solutions.这里的人们需要能够剪切和粘贴数据文本以测试和验证解决方案。 Ideally, you should do this by the output of the dput function into a code block.理想情况下,您应该通过将输入 function 的dput放入代码块中来执行此操作。 People rarely go to the effort of manually transcribing data from your images.人们很少会从图像中手动转录数据。

Created on 2022-02-04 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2022 年 2 月 4 日创建

Lest approach your problem using dplyr an tidyr packages.以免使用dplyrtidyr包来解决您的问题。

The first recommendation for you is to always add a minimal reproducible example of your data in order for us to use it and help you faster.对您的第一个建议是始终添加您的数据的最小可重现示例,以便我们使用它并更快地帮助您。 This is not complicated, you can use the dput(head(yourdata, 10)) , for example, or simulate some observations.这并不复杂,您可以使用dput(head(yourdata, 10)) ,例如,或模拟一些观察。

I did a simulation as follow:我做了一个模拟如下:

library(dplyr)
library(tidyr)

data <- data.frame(
  id = 1:5,
  loudHot = sample(10:20, 5, replace = TRUE),
  quieHot = sample(10:20, 5, replace = TRUE),
  loudCold = sample(0:12, 5, replace = TRUE),
  quiteCold = sample(0:12, 5, replace = TRUE)
)

Now that we have the data, lest turn it to long format using tidyr::pivot_longer .现在我们有了数据,不要使用tidyr::pivot_longer将其转换为长格式。 This function recibe as argument the dataframe in wide format, de columns you want to gather (or those you do not want to gather using the - symbol).此 function 将 dataframe 以宽格式作为参数,您想要收集的列(或您不想使用-符号收集的列)。

# Data to long format
data_long <- pivot_longer(
  data, cols = -id, 
  names_to = 'variable', values_to = 'value'
  )

With that, now you only have to create the dummys, which is simple.有了它,现在您只需要创建虚拟对象,这很简单。

# Adding new variables
data_with_dummy <- mutate(
  data_long,
  volume = as.numeric(variable %in% c('loudHot', "loudCold")),
  temp = as.numeric(variable %in% c('loudHot', "quietCold"))
  )

Here's a base R approach:这是一个基本的 R 方法:

# Original data
df <- data.frame(
  ID = c(2, 4, 5, 7, 8, 11, 12, 16),
  loudHot = c(14, 19, 11, 20, 18, 17, 16, 2),
  quietHot = c(16, 15, 17, 5, 10, 10, 15, 0),
  loudCold = c(16, 10, 10, 4, 3, 2, 14, 2),
  quietCold = c(15, 8, 17, 8, 10, 12, 5, 0)
)

# Stacked data
df_stacked <- stack(
  df,
  select = c(
    "loudHot", "quietHot", "loudCold", "quietCold"
  )
)

# New variable for volume
df_stacked$Volume <- as.numeric(grepl("loud", df_stacked$ind))

# New variable for Temp
df_stacked$Temp <- as.numeric(grepl("Hot", df_stacked$ind))

# Replace "ind" values with "ID"
df_stacked$ind <- rep(df$ID, times = 4)

# Reorder columns
new_df <- df_stacked[,c(2:4,1)]

# Rename columns
colnames(new_df) <- c("ID", "Volume", "Temp", "Value")

# Order by ID
new_df[order(new_df$ID),]

I believe your columns for "Volume" and "Temp" should be alternating sequences:我相信您的“音量”和“温度”列应该是交替序列:

R 代码的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 R 中创建变量来计算另一个变量的一个变量的值的数量? - How do I create variables in R that count the number of values of one variable by another variable? R:如何为多个值创建虚拟变量? - R: How to create dummy variable for multiple values? 如何创建一个依赖于多列值的虚拟变量? - How do I create a dummy variable that depends on values in multiple columns? 如何根据 R 中的两个分类值创建新变量? - How do I create a new variable based on two categorical values in R? 在 R 中,如果我正在检查的变量缺少值,我想创建两个新变量 - In R, I want to create two new variables if the variable I'm checking has missing values 使用 r 将两个变量的值组合在一起以创建一个新变量 - combining values of two variables together to create a new variable using r 如何在R中创建虚拟变量? - How do I make a dummy variable in R? 如何在 r 中使用虚拟变量创建分类变量? - How do you create categorical variables using dummy variables in r? 如何创建一个显示 4 个虚拟变量的不同组合的新变量? - How to create a new variable that shows different combinations of 4 dummy variables? 如何在 R 中将一个分类变量转换为多个虚拟变量? - How do convert a categorical variable into multiple dummy variables in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM