[英]How can I transform my data frame from wide to long in R?
I have issues with transforming my data frame from wide to long.我在将数据框从宽转换为长时遇到问题。 I'm well aware that there are plenty of excellent vignettes out there, which explain gather() or pivot_longer() very precisely (eg https://www.storybench.org/pivoting-data-from-columns-to-rows-and-back-in-the-tidyverse/ ).
我很清楚那里有很多优秀的小插曲,它们非常精确地解释了 gather() 或 pivot_longer() (例如https://www.storybench.org/pivoting-data-from-columns-to-rows-和-回到-tidyverse/ )。 Nevertheless, I'm still stuck for days now and this drives me crazy.
尽管如此,我现在仍然被困了好几天,这让我发疯。 Thus, I dediced to ask the internet.
于是,我特意去网上问问。 You.
你。
I have a data frame that looks like this:我有一个看起来像这样的数据框:
id <- c(1,2,3)
year <- c(2018,2003,2011)
lvl <- c("A","B","C")
item.1 <- factor(c("A","A","C"),levels = lvl)
item.2 <- factor(c("C","B","A"),levels = lvl)
item.3 <- factor(c("B","B","C"),levels = lvl)
df <- data.frame(id,year,item.1,item.2,item.3)
So we have an id variable for each observation (eg movies).所以我们为每个观察(例如电影)都有一个 id 变量。 We have a year variable, indicating when the observation took place (eg when the movie was released).
我们有一个年份变量,表示观察发生的时间(例如电影上映的时间)。 And we have three factor variables that assessed different characteristics of the observation (eg cast, storyline and film music).
我们有三个因素变量来评估观察的不同特征(例如演员、故事情节和电影音乐)。 Those three factor variables share the same factor levels "A","B" or "C" (eg cast of the movie was "excellent", "okay" or "shitty").
这三个因子变量共享相同的因子水平“A”、“B”或“C”(例如,电影的演员阵容是“优秀”、“还可以”或“糟糕”)。
But in my wildest dreams, the data more look like this:但在我最疯狂的梦想中,数据更像是这样的:
id.II <- c(rep(1, 9), rep(2, 9), rep(3,9))
year.II <- c(rep(2018, 9), rep(2003, 9), rep(2011,9))
item.II <- rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3)
rating.II <- rep(c("A", "B", "C"), 9)
number.II <- c(1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1)
df.II <- data.frame(id.II,year.II,item.II,rating.II,number.II)
So now the data frame would be way more useable for further analysis.因此,现在数据框将更可用于进一步分析。 For example, the next step would be to calculate for each year the number (or even better percentage) of movies that were rated as "excellent".
例如,下一步将计算每年被评为“优秀”的电影的数量(甚至更高的百分比)。
year.III <- factor(c(rep(2018, 3), rep(2003, 3), rep(2011,3)))
item.III <- factor(rep(c(1, 2, 3), 3))
number.A.III <- c(1,0,0,1,0,0,0,1,0)
df.III <- data.frame(year.III,item.III,number.A.III)
ggplot(data=df.III, aes(x=year.III, y=number.A.III, group=item.III)) +
geom_line(aes(color=item.III))+
geom_point(aes(color=item.III))+
theme(panel.background = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
legend.position = "bottom")+
labs(colour="Item")
Or even more important to me, show for each item (cast, storytelling, film music) the percentage of being rated as "excellent", "okay" and "shitty".或者对我来说更重要的是,显示每个项目(演员、讲故事、电影音乐)被评为“优秀”、“还可以”和“糟糕”的百分比。
item.IV <- factor(rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3))
rating.IV <- factor(rep(c("A", "B", "C"), 9))
number.IV <- c(2,0,1,1,1,1,0,2,1)
df.IV <- data.frame(item.IV,rating.IV,number.IV)
df.IV
ggplot(df.IV,aes(fill=rating.IV,y=number.IV,x=item.IV))+
geom_bar(position= position_fill(reverse = TRUE), stat="identity")+
theme(axis.title.y = element_text(size = rel(1.2), angle = 0),
axis.title.x = element_blank(),
panel.background = element_blank(),
legend.title = element_blank(),
legend.position = "bottom")+
labs(x = "Item")+
coord_flip()+
scale_x_discrete(limits = rev(levels(df.IV$item.IV)))+
scale_y_continuous(labels = scales::percent)
My primary question is: How do I transform the data frame df into df.II?我的主要问题是:如何将数据框 df 转换为 df.II? That would make my day.
那会让我很开心。 Wrong.
错误的。 My weekend.
我的周末。
And if you could then also give a hint how to proceed from df.II to df.III and df.IV that would be absolutely mindblowing.如果您还可以提示如何从 df.II 继续到 df.III 和 df.IV,那绝对是令人兴奋的。 However, I don't want to burden you too much with my problems.
但是,我不想因为我的问题给你太多负担。
Best wishes Jascha最好的祝愿 Jascha
Does this achieve what you need?这是否达到了您的需要?
library(tidyverse)
df_long <- df %>%
pivot_longer(cols = item.1:item.3, names_to = "item", values_to = "rating") %>%
mutate(
item = str_remove(item, "item.")
)
df2 <- crossing(
df_long,
rating_all = unique(df_long$rating)
) %>%
mutate(n = rating_all == rating) %>%
group_by(id, year, item, rating_all) %>%
summarise(n = sum(n))
df3 <- df2 %>%
filter(item == "3")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.