[英]R: reshaping data frame - add column(s), while keeping row index values consistent
我想使用dplyr::mutate()
重塑以下数据框:目标是创建一个新的列变量percent
但确保值对应于正确的subject
。 数据当前为“长”格式。 详细:我想提取与模式“percentQ”相关的行,并根据每个subject
创建一个名为percent
的新列,以确保分数与相应的subject
对齐。
df_long <- structure(list(id = c(NA,NA, "scoreQ1", NA, "scoreQ2", NA, NA,"percentQ1", "percentQ2", NA, "GPA"),
subject = c(NA,NA, "Chris", NA, "Liz", NA, NA, "Chris","Liz", NA, NA),
grade = c(NA,NA, 45L, NA, 60L, NA, NA, 75L, 100L, NA)), row.names = c(NA,-11L), class = c("data.table", "data.frame"))
print(df_long)
#id subject grade
#<NA> <NA> NA
#<NA> <NA> NA
#scoreQ1 Chris 45
#<NA> <NA> NA
#scoreQ2 Liz 60
#<NA> <NA> NA
#<NA> <NA> NA
#percentQ1 Chris 75
#percentQ2 Liz 100
#<NA> <NA> NA
#GPA <NA> NA
请建议 R 脚本,该脚本将允许将数据框重塑为以下内容:可以看出, percent
值对应于正确的subject
,在这种情况下,Chris 为75
,Liz 为100
。 我一直遇到问题,无法将percent
值分配给正确的subject
。
df_wide <- structure(list(id = c(NA,NA, "scoreQ1", NA, "scoreQ2", NA, NA, NA, "GPA"),
subject = c(NA,NA, "Chris", NA, "Liz", NA, NA, NA, NA),
grade = c(NA,NA, 45L, NA, 60L, NA, NA, NA, NA),
percent = c(NA,NA, 75L, NA, 100L, NA, NA, NA, NA)), row.names = c(NA,-9L), class = c("data.table", "data.frame"))
print(df_wide)
#id subject grade percent
#<NA> <NA> NA NA
#<NA> <NA> NA NA
#scoreQ1 Chris 45 75
#<NA> <NA> NA NA
#scoreQ2 Liz 60 100
#<NA> <NA> NA NA
#<NA> <NA> NA NA
#<NA> <NA> NA NA
#GPA <NA> NA NA
我认为您的任务可以通过加入流程来完成。
在这里,我们可以过滤掉 id 列中带有百分比的值,然后在使用rename
后将其left_join
到 data_frame 中。
library(dplyr)
percent <- grepl("percent", df_long$id)
df_long |>
filter(!percent) |>
left_join(
filter(df_long, percent) |>
rename(percent = "grade") |>
select(-id), by = "subject"
)
id subject grade percent
1: <NA> <NA> NA NA
2: <NA> <NA> NA NA
3: scoreQ1 Chris 45 75
4: <NA> <NA> NA NA
5: scoreQ2 Liz 60 100
6: <NA> <NA> NA NA
7: <NA> <NA> NA NA
8: <NA> <NA> NA NA
9: GPA <NA> NA NA
数据
df_long <- structure(list(id = c(NA, NA, "scoreQ1", NA, "scoreQ2", NA, NA,
"percentQ1", "percentQ2", NA, "GPA"), subject = c(NA, NA, "Chris",
NA, "Liz", NA, NA, "Chris", "Liz", NA, NA), grade = c(NA, NA,
45L, NA, 60L, NA, NA, 75L, 100L, NA, NA)), row.names = c(NA,
-11L), class = c("data.table", "data.frame"))
我可能遗漏了一些东西,但您并没有真正从长格式转向宽格式,您只是添加了一个额外的列。 假设最高分是 60,那么:
library(tidyverse)
df2 <-
df_long |>
mutate(percent = (grade/60)*100)
应该做的伎俩。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.