簡體   English   中英

基於列表結構模式創建新列表

[英]Create New Lists Based on List Structure Pattern

我有一些看起來像這樣的數據:

   dat <- c("Sales","Jim","Halpert","","",
            "Reception","Pam","Beasley","","",
            "Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica","","",
            "Manager","Michael","Scott","","")

每個“塊”數據都是連續的,中間有一些空白。 我想將數據轉換為如下所示的列表列表:

iwant <- c(
           c("Sales","Jim","Halpert"),
           c("Reception","Pam","Beasley"),
           c("Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica"),
           c("Manager","Michael","Scott")
           )

建議? 我正在使用 rvest 和 stringi。 我不想添加更多包。

我會建議下一種方法。 您最終將得到一個 dataframe ,其變量格式與您想要的類似:

#Split chains
L1 <- strsplit(paste0(dat,collapse = " "),split = "  ")
#Split vectors from each chain
L2 <- lapply(L1[[1]],function(x) strsplit(trimws(x),split = " "))
#Format
L2 <- lapply(L2,as.data.frame)
#Remove zero dim data
L2[which(lapply(L2,nrow)==0)]<-NULL
#Format names
L2 <- lapply(L2,function(x) {names(x)<-'v';return(x)})
#Transform to dataframe
D1 <- as.data.frame(do.call(cbind,L2))
#Rename
names(D1) <- paste0('V',1:dim(D1)[2])
#Remove recycled values
D1 <- as.data.frame(apply(D1,2,function(x) {x[duplicated(x)]<-NA;return(x)}))

Output:

       V1        V2                  V3      V4
1   Sales Reception         Not.Manager Manager
2     Jim       Pam              Dwight Michael
3 Halpert   Beasley             Schrute   Scott
4    <NA>      <NA>               Bears    <NA>
5    <NA>      <NA>               Beets    <NA>
6    <NA>      <NA> BattlestarGalactica    <NA>

您可以使用lapply rle split

lapply(split(dat, with(rle(dat != ''), 
             rep(cumsum(values), lengths))), function(x) x[x!= ''])

#$`1`
#[1] "Sales"   "Jim"     "Halpert"

#$`2`
#[1] "Reception" "Pam"       "Beasley"  

#$`3`
#[1] "Not.Manager"         "Dwight"    "Schrute"     "Bears"   "Beets"            
#[6] "BattlestarGalactica"

#$`4`
#[1] "Manager" "Michael" "Scott"  

rle部分創建要split的組:

with(rle(dat != ''), rep(cumsum(values), lengths))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4

split后,我們使用lapply從每個列表中刪除任何空元素。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM