[英]How do I create two new variables out of one variable, and attach dummy values to it in R?
I am completely new to any kind of coding, nevermind R in particular, so my days of googling have not been very helpful.我对任何类型的编码都是全新的,尤其是 R,所以我在谷歌上搜索的日子并不是很有帮助。 I would really appreciate any kind of help/insights!
我真的很感激任何帮助/见解!
I would like to know how to get two new variables out of the original variable, and attach new values to it - basically I start with this:我想知道如何从原始变量中获取两个新变量,并为其附加新值 - 基本上我从这个开始:
and want to obtain this:并想得到这个:
I managed to get it in long format with melt(dataname, id.vars=c("ID"))
and the ID & value I get are good.我设法用
melt(dataname, id.vars=c("ID"))
得到它的长格式,我得到的 ID 和值很好。 But there is only one variable with my four headers (loudHot, quietHot, loudCold, quietCold) repeated - how do I create two new variables out of this and assign the values to it (eg that "Volume" has the value 1 when the original variable is loudHot or loudCold and 0 if its quietHot or quietCold, and then "Temp" is 1 when the original variable is loudHot or quietHot and 0 when its loudCold or quietCold)?但是只有一个变量,我的四个标题(loudHot、quietHot、loudCold、quietCold)重复了 - 我如何从中创建两个新变量并为其分配值(例如,“Volume”的值是 1 当原始变量是 loudHot 或 loudCold,如果是 quietHot 或 quietCold,则为 0,然后当原始变量为 loudHot 或 quietHot 时,“Temp”为 1,当它的 loudCold 或 quietCold 时为 0)?
I wouldn't be too hard on yourself - this isn't really trivial.我不会对自己太苛刻 - 这并不是微不足道的。 Anyway, you can use
pivot_longer
from tidyr
and some data manipulation with dplyr
to achieve your desired outcome:无论如何,您可以使用
pivot_longer
中的tidyr
和dplyr
进行一些数据操作来实现您想要的结果:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-ID) %>%
mutate(Volume = as.numeric(grepl("loud", name)),
Temp = as.numeric(grepl("Hot", name))) %>%
select(ID, Volume, Temp, value)
#> # A tibble: 32 x 4
#> ID Volume Temp value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1 1 14
#> 2 2 0 1 16
#> 3 2 1 0 16
#> 4 2 0 0 15
#> 5 4 1 1 19
#> 6 4 0 1 15
#> 7 4 1 0 10
#> 8 4 0 0 8
#> 9 6 1 1 11
#> 10 6 0 1 17
#> # ... with 22 more rows
Data数据
df <- data.frame(ID = (1:8) * 2,
loudHot = c(14, 19, 11, 20, 18, 17, 16, 2),
quietHot = c(16, 15, 17, 5, 10, 10, 15, 0),
loudCold = c(16, 10, 10, 4, 3, 2, 14, 2),
quietCold = c(15, 8, 17, 8 ,10, 12, 5, 0))
As a tip for any future SO questions, please don't post images of data.作为任何未来 SO 问题的提示,请不要发布数据图像。 Folks here need to be able to cut and paste the text of your data to test and verify solutions.
这里的人们需要能够剪切和粘贴数据文本以测试和验证解决方案。 Ideally, you should do this by the output of the
dput
function into a code block.理想情况下,您应该通过将输入 function 的
dput
放入代码块中来执行此操作。 People rarely go to the effort of manually transcribing data from your images.人们很少会从图像中手动转录数据。
Created on 2022-02-04 by the reprex package (v2.0.1)由代表 package (v2.0.1) 于 2022 年 2 月 4 日创建
Lest approach your problem using dplyr
an tidyr
packages.以免使用
dplyr
和tidyr
包来解决您的问题。
The first recommendation for you is to always add a minimal reproducible example of your data in order for us to use it and help you faster.对您的第一个建议是始终添加您的数据的最小可重现示例,以便我们使用它并更快地帮助您。 This is not complicated, you can use the
dput(head(yourdata, 10))
, for example, or simulate some observations.这并不复杂,您可以使用
dput(head(yourdata, 10))
,例如,或模拟一些观察。
I did a simulation as follow:我做了一个模拟如下:
library(dplyr)
library(tidyr)
data <- data.frame(
id = 1:5,
loudHot = sample(10:20, 5, replace = TRUE),
quieHot = sample(10:20, 5, replace = TRUE),
loudCold = sample(0:12, 5, replace = TRUE),
quiteCold = sample(0:12, 5, replace = TRUE)
)
Now that we have the data, lest turn it to long format using tidyr::pivot_longer
.现在我们有了数据,不要使用
tidyr::pivot_longer
将其转换为长格式。 This function recibe as argument the dataframe in wide format, de columns you want to gather (or those you do not want to gather using the -
symbol).此 function 将 dataframe 以宽格式作为参数,您想要收集的列(或您不想使用
-
符号收集的列)。
# Data to long format
data_long <- pivot_longer(
data, cols = -id,
names_to = 'variable', values_to = 'value'
)
With that, now you only have to create the dummys, which is simple.有了它,现在您只需要创建虚拟对象,这很简单。
# Adding new variables
data_with_dummy <- mutate(
data_long,
volume = as.numeric(variable %in% c('loudHot', "loudCold")),
temp = as.numeric(variable %in% c('loudHot', "quietCold"))
)
Here's a base R approach:这是一个基本的 R 方法:
# Original data
df <- data.frame(
ID = c(2, 4, 5, 7, 8, 11, 12, 16),
loudHot = c(14, 19, 11, 20, 18, 17, 16, 2),
quietHot = c(16, 15, 17, 5, 10, 10, 15, 0),
loudCold = c(16, 10, 10, 4, 3, 2, 14, 2),
quietCold = c(15, 8, 17, 8, 10, 12, 5, 0)
)
# Stacked data
df_stacked <- stack(
df,
select = c(
"loudHot", "quietHot", "loudCold", "quietCold"
)
)
# New variable for volume
df_stacked$Volume <- as.numeric(grepl("loud", df_stacked$ind))
# New variable for Temp
df_stacked$Temp <- as.numeric(grepl("Hot", df_stacked$ind))
# Replace "ind" values with "ID"
df_stacked$ind <- rep(df$ID, times = 4)
# Reorder columns
new_df <- df_stacked[,c(2:4,1)]
# Rename columns
colnames(new_df) <- c("ID", "Volume", "Temp", "Value")
# Order by ID
new_df[order(new_df$ID),]
I believe your columns for "Volume" and "Temp" should be alternating sequences:我相信您的“音量”和“温度”列应该是交替序列:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.