[英]How can I tidy a very messy long format data set using tidyverse or base-R functions?
[英]How can I make a list object to dataframe in long format with tidyverse in R
dd <- list(c("2020-1-2","SDF","fff,33"),
c("2020-1-3","KKK","ffd,23","fffdf,23","ssfds,43"))
dd
# [[1]]
# [1] "2020-1-2" "SDF" "fff,33"
# [[2]]
# [1] "2020-1-3" "KKK" "ffd,23" "fffdf,23" "ssfds,43"
ddtarget <- data.frame(date= c("2020-1-2","2020-1-3","2020-1-3","2020-1-3"),
category = c("SDF","KKK","KKK","KKK"),
element = c("fff,33","ffd,23","fffdf,23","ssfds,43"))
ddtarget
# date category element
# 1 2020-1-2 SDF fff,33
# 2 2020-1-3 KKK ffd,23
# 3 2020-1-3 KKK fffdf,23
# 4 2020-1-3 KKK ssfds,43
我想使用諸如map()或類似函數之類的tidyverse將dd轉換為ddtarget,但我自己無法做到。 誰能幫我?
假設前兩列的長度總是為 1,第三列是所有剩余的元素,這里是使用map_df
一種方法
purrr::map_df(dd,~tibble(date = .x[1], category = .x[2],element = .x[3:length(.x)]))
# A tibble: 4 x 3
# date category element
# <chr> <chr> <chr>
#1 2020-1-2 SDF fff,33
#2 2020-1-3 KKK ffd,23
#3 2020-1-3 KKK fffdf,23
#4 2020-1-3 KKK ssfds,43
如果第一個元素始終是date
,第二個元素始終是category
,其余元素始終是element
,則可以執行以下操作:
do.call(rbind, lapply(dd, function(x) {
data.frame(date = x[1L], category = x[2L], element = tail(x, -2L))
}))
# date category element
# 1 2020-1-2 SDF fff,33
# 2 2020-1-3 KKK ffd,23
# 3 2020-1-3 KKK fffdf,23
# 4 2020-1-3 KKK ssfds,43
如果length(dd)
是大的,你可以考慮使用data.table::rbindlist
這比更優化do.call(rbind)
data.table::rbindlist(lapply(...))
我還懷疑您的管道早些時候出了點問題dd
是從哪里來的? 為什么采用這種格式?
如果您可以控制創建dd
的步驟,您可以考慮更全面地設計您的管道。
另一種選擇是unnest_wider
和pivot_longer
library(tibble)
library(dplyr)
library(tidyr)
library(stringr)
tibble(dat = dd) %>%
unnest_wider(c(dat), names_repair = ~c('date', 'category', str_c('V', 3:length(.)))) %>%
pivot_longer(cols = V3:V5, values_to = "element", values_drop_na = TRUE) %>%
select(-name)
# A tibble: 4 x 3
# date category element
# <chr> <chr> <chr>
#1 2020-1-2 SDF fff,33
#2 2020-1-3 KKK ffd,23
#3 2020-1-3 KKK fffdf,23
#4 2020-1-3 KKK ssfds,43
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.