简体   繁体   English

如何在 R 中将我的数据帧从宽转换为长?

[英]How can I transform my data frame from wide to long in R?

I have issues with transforming my data frame from wide to long.我在将数据框从宽转换为长时遇到问题。 I'm well aware that there are plenty of excellent vignettes out there, which explain gather() or pivot_longer() very precisely (eg https://www.storybench.org/pivoting-data-from-columns-to-rows-and-back-in-the-tidyverse/ ).我很清楚那里有很多优秀的小插曲,它们非常精确地解释了 gather() 或 pivot_longer() (例如https://www.storybench.org/pivoting-data-from-columns-to-rows-和-回到-tidyverse/ )。 Nevertheless, I'm still stuck for days now and this drives me crazy.尽管如此,我现在仍然被困了好几天,这让我发疯。 Thus, I dediced to ask the internet.于是,我特意去网上问问。 You.你。

I have a data frame that looks like this:我有一个看起来像这样的数据框:

id     <- c(1,2,3)
year   <- c(2018,2003,2011)
lvl    <- c("A","B","C")
item.1 <- factor(c("A","A","C"),levels = lvl)
item.2 <- factor(c("C","B","A"),levels = lvl)
item.3 <- factor(c("B","B","C"),levels = lvl)
df     <- data.frame(id,year,item.1,item.2,item.3)

So we have an id variable for each observation (eg movies).所以我们为每个观察(例如电影)都有一个 id 变量。 We have a year variable, indicating when the observation took place (eg when the movie was released).我们有一个年份变量,表示观察发生的时间(例如电影上映的时间)。 And we have three factor variables that assessed different characteristics of the observation (eg cast, storyline and film music).我们有三个因素变量来评估观察的不同特征(例如演员、故事情节和电影音乐)。 Those three factor variables share the same factor levels "A","B" or "C" (eg cast of the movie was "excellent", "okay" or "shitty").这三个因子变量共享相同的因子水平“A”、“B”或“C”(例如,电影的演员阵容是“优秀”、“还可以”或“糟糕”)。

But in my wildest dreams, the data more look like this:但在我最疯狂的梦想中,数据更像是这样的:

id.II     <- c(rep(1, 9), rep(2, 9), rep(3,9))
year.II   <- c(rep(2018, 9), rep(2003, 9), rep(2011,9))
item.II   <- rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3)
rating.II <- rep(c("A", "B", "C"), 9)
number.II  <- c(1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1)
df.II     <- data.frame(id.II,year.II,item.II,rating.II,number.II)

So now the data frame would be way more useable for further analysis.因此,现在数据框将更可用于进一步分析。 For example, the next step would be to calculate for each year the number (or even better percentage) of movies that were rated as "excellent".例如,下一步将计算每年被评为“优秀”的电影的数量(甚至更高的百分比)。

year.III   <- factor(c(rep(2018, 3), rep(2003, 3), rep(2011,3)))
item.III   <- factor(rep(c(1, 2, 3), 3))
number.A.III <- c(1,0,0,1,0,0,0,1,0)
df.III     <- data.frame(year.III,item.III,number.A.III)

ggplot(data=df.III, aes(x=year.III, y=number.A.III, group=item.III)) +
  geom_line(aes(color=item.III))+
  geom_point(aes(color=item.III))+
  theme(panel.background = element_blank(),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        legend.position = "bottom")+
  labs(colour="Item")

Or even more important to me, show for each item (cast, storytelling, film music) the percentage of being rated as "excellent", "okay" and "shitty".或者对我来说更重要的是,显示每个项目(演员、讲故事、电影音乐)被评为“优秀”、“还可以”和“糟糕”的百分比。

item.IV   <- factor(rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3))
rating.IV <- factor(rep(c("A", "B", "C"), 9))
number.IV <- c(2,0,1,1,1,1,0,2,1)
df.IV     <- data.frame(item.IV,rating.IV,number.IV)
df.IV

ggplot(df.IV,aes(fill=rating.IV,y=number.IV,x=item.IV))+
  geom_bar(position= position_fill(reverse = TRUE), stat="identity")+
  theme(axis.title.y = element_text(size = rel(1.2), angle = 0),
        axis.title.x = element_blank(),
        panel.background = element_blank(),
        legend.title = element_blank(),
        legend.position = "bottom")+
  labs(x = "Item")+
  coord_flip()+
  scale_x_discrete(limits = rev(levels(df.IV$item.IV)))+
  scale_y_continuous(labels = scales::percent)

My primary question is: How do I transform the data frame df into df.II?我的主要问题是:如何将数据框 df 转换为 df.II? That would make my day.那会让我很开心。 Wrong.错误的。 My weekend.我的周末。

And if you could then also give a hint how to proceed from df.II to df.III and df.IV that would be absolutely mindblowing.如果您还可以提示如何从 df.II 继续到 df.III 和 df.IV,那绝对是令人兴奋的。 However, I don't want to burden you too much with my problems.但是,我不想因为我的问题给你太多负担。

Best wishes Jascha最好的祝愿 Jascha

Does this achieve what you need?这是否达到了您的需要?

library(tidyverse)

df_long <- df %>%
  pivot_longer(cols = item.1:item.3, names_to = "item", values_to = "rating") %>%
  mutate(
    item = str_remove(item, "item.")
  )


df2 <- crossing(
  df_long,
  rating_all = unique(df_long$rating)
) %>%
  mutate(n = rating_all == rating) %>%
  group_by(id, year, item, rating_all) %>%
  summarise(n = sum(n))

df3 <- df2 %>%
  filter(item == "3")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在R的单元格中将长格式的数据帧转换为具有多个值的宽格式的数据帧? - How can I transform a long-formated data frame to a wide-formated one with multiple values within a cell in R? 转换 R 中的就业数据 - 从宽到长 - transform employment data in R - from wide to long 如何通过创建手段将r中的长数据集转换为宽数据集 - How to transform from long to wide data set in r by creating means 如何在R中将数据帧从宽格式重塑为长格式? - How to reshape a data frame from wide to long format in R? 如何在R中将数据框转换为这种特定格式? - How can I transform my data frame into this specific format in R? 在R中将宽面板data.frame转换为长的从源到目标的过渡格式 - transform a wide panel data.frame to long from-to (source-destination) transition format, in R R根据多个唯一变量有条件地将数据帧从长到宽转换 - R Conditionally transform data frame from long to wide based on multiple unique variables 在R中使用dcast将数据帧从长格式转换为宽格式不起作用 - Using dcast in R to transform data frame from long to wide format not working 如何在 R 中将行中包含多个变量的数据框从宽转换为长? - How do I convert a data frame with multiple variables in rows from wide to long in R? 如何扩展数据帧(从长到宽)并保存两个字段的数据? - How can I spread a data frame (from long to wide) and preserve two fields' data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM