R：將特定行轉換為列

Question

我從json文件導入了相當混亂的數據，它看起來像這樣：

raw_df <- data.frame(text = c(paste0('text', 1:3), '---------- OUTCOME LINE ----------', paste0('text', 4:6), '---------- OUTCOME LINE ----------'),
                              demand = c('cat1', rep('', 2), 'info', 'cat2', rep('', 2), 'info2')
                     )



raw_df
                                text demand
1                              text1   cat1
2                              text2       
3                              text3       
4 ---------- OUTCOME LINE ----------   info
5                              text4   cat2
6                              text5       
7                              text6       
8 ---------- OUTCOME LINE ----------  info2

（BTW， ---------- OUTCOME LINE ----------是我在text列中的實際字符串）

我想整理一下，以便它具有以下格式：

final_df
                  text demand outcome
1 text1. text2. text3.   cat1   info1
2 text4. text5. text6.   cat2   info2

什么是最快最有效的方法呢？ 謝謝你的提示。

Answer 1

dplyr ＆ tidyr解決方案：

raw_df %>% 
    mutate(outcome = demand,
           demand = replace(demand, demand == '', NA),
           outcome = replace(outcome, outcome == '', NA),
           outcome = gsub("^cat\\d+", NA, outcome)) %>% 
    fill(demand) %>% 
    fill(outcome, .direction = "up") %>% 
    filter(!grepl("-----", text)) %>%
    group_by(demand, outcome) %>% 
    summarize(text = gsub(",", "\\.", toString(text))) %>% 
    select(text, everything())

修復要根據需要顯示的文本，替換NA的空白，並准備結果列。
在默認向下方向fill demand列，在向上方向fill結果列。
filter掉----- OUTCOME LINE ------根據它的連字符。
產生group_concat為text列，然后交換默認的,不與. 。
select所需列的列。

 # A tibble: 2 x 3 # Groups: demand [2] text demand outcome <chr> <fctr> <chr> 1 text1. text2. text3 cat1 info 2 text4. text5. text6 cat2 info2

Answer 2

在這里，我們使用'grepl'創建一個邏輯索引，基於- 'text'列中的-的存在，'raw_df'的子集去除那些行，通過獲取'indx'的累積和來創建分組列， aggregate以paste在用NA替換''並使用na.locf填充非NA先前值之后，'text'列按'demand'分組。 然后，通過使用'indx'進行子集化，從'demand'創建'結果'

indx <- grepl("-", raw_df$text)
transform(aggregate(text~demand, transform(raw_df[!indx,], 
  demand = zoo::na.locf(replace(demand, demand=="", NA))), toString),
    outcome = raw_df$demand[indx])
#  demand                text outcome
#1   cat1 text1, text2, text3    info
#2   cat2 text4, text5, text6   info2

或者這可以使用data.table完成

library(data.table)
setDT(raw_df)[demand == "", demand := NA][!indx, .(text= paste(text, collapse='. ')),
          .(demand = zoo::na.locf(demand))][, outcome := raw_df$demand[indx]][]

R：將特定行轉換為列

問題描述

2 個解決方案

解決方案1
2 2017-11-14 00:06:01

解決方案2
1 已采納 2017-11-13 11:05:22

R：將特定行轉換為列

問題描述

2 個解決方案

解決方案1 2 2017-11-14 00:06:01

解決方案2 1 已采納 2017-11-13 11:05:22

解決方案1
2 2017-11-14 00:06:01

解決方案2
1 已采納 2017-11-13 11:05:22