[英]Reshaping wide dataframe to long format
我有以下格式的df:
姓名 | 其他信息 | 收入_2015 | ebitda_2015 | ebitda_2016 | 收入_2015 | 其他_2017 |
---|---|---|---|---|---|---|
一個 | 信息1 | 1 | 2 | 3 | 4 | 5 |
乙 | 信息2 | 6 | 7 | 8 | 9 | 10 |
C | 信息3 | 11 | 12 | 13 | 14 | 15 |
我想將其更改為長格式,並按以下方式構建:
姓名 | 信息 | 年份 | 指標名稱 | 價值
你能告訴我如何在 R 中做到這一點嗎? 既然真正的dataframe有300多列,有沒有辦法自動創建年份列呢?
數據:
structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1",
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L,
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L,
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L,
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L,
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L,
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
這對你有用嗎?
library(dplyr)
library(tidyr)
structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1",
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L,
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L,
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L,
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L,
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L,
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)) %>%
pivot_longer(revenues_2015:other_2017, names_pattern = "(.+)_(\\d{4})", names_to = c("metric", "year"))
您有兩個選擇,您可以使用實用工具 package(base-r 函數,您不必使用 library() 調用它)或從 reshape2 ZEFE90A8E604A7C840E8ZD03A 熔化 function
使用function reshape() :
data = structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1",
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L,
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L,
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L,
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L,
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L,
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
LF_data = reshape(data=data, idvar = c("name","other_info"), varying =c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"),
v.names = c("Value"),times=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"), direction = "long")
使用package reshape2 melt() function:
data=data.frame(structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1",
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L,
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L,
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L,
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L,
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L,
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)),stringsAsFactors=False)
2. Then:
LF_data=reshape2::melt(data,id.vars=c("name","other_info"), mesure.vars=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"))
除非它們是唯一的,否則融化不會讓您擁有“名稱”、“其他信息”和“變量”的組合。 在您的示例中,它將第二個三元組的收入_2015 更改為收入_2015.1
有點太晚了:類似於-mad-statter 解決方案。 使用 mutate 略有不同:
library(tidyr)
library(dplyr)
df <- structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1",
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L,
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L,
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L,
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L,
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L,
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, -3L)) %>%
pivot_longer(revenues_2015:other_2017, names_to = c("Metric name", "Year"),
names_sep ="_", values_to = "Value") %>%
dplyr::mutate(Year = stringr::str_remove(Year, "\\D")) %>%
rename(Name=name, Info = other_info)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.