根據值將dataframe列的內容拆分為不同的列

Question

我試圖將以下數據幀列拆分為3列，具體取決於內容。 我嘗試使用dplyr和mutate因為我想更好地學習它們，但任何建議都會受到歡迎。

exampledf<-data.frame(c("Argentina","2005/12","2005/11","Bolivia","2006/12"),stringsAsFactors=F)
mutate(exampledf,month=strsplit(exampledf[,1],"/")[1],month=strsplit(exampledf[,1],"/")[2])

我的目標：

Year     Month    Country
2005     12       Argentina
2005     11       Argentina
2006     12       Bolivia

這是非常接近這個 SO職位，但它不解決我重復的國家的問題。

Answer 1

我們為沒有數字的行（'i1'）創建一個邏輯索引，得到它的累積和，用該分組索引split數據集，用（ sub ）和'國家'提取'年'，'月' '作為第一個元素，創建一個data.frame ，並對list內容進行rbind 。

 i1 <- grepl('^[^0-9]+$', exampledf$Col1)
 lst <- lapply(split(exampledf, cumsum(i1)), function(x) 
   data.frame(year= as.numeric(sub('\\/.*', '',   x[-1,1])), 
              month = as.numeric(sub('.*\\/', '', x[-1,1])),
              Country = x[1,1] ) )
 res <- do.call(rbind, lst)
 row.names(res) <- NULL

 res
 # year month   Country
 #1 2005    12 Argentina
 #2 2005    11 Argentina
 #3 2006    12   Bolivia

或者使用data.table ，我們將'data.frame'轉換為'data.table'（ setDT(exampledf) ），按索引的cumsum （從上面）分組，我們在'Col1'上拆分（ tstrsplit ）（使用分隔符（ / ）刪除第一個元素。 我們從中得到兩列。 然后，連接第一個元素以創建三列，並使用setnames更改列名。 如果我們不需要分組變量，則可以將（ := ）分配給NULL。

library(data.table)
res1 <- setDT(exampledf)[, c(tstrsplit(Col1[-1], 
        '/'),Country = Col1[1L]), .(i2=cumsum(i1))][,i2:= NULL][]
setnames(res1, 1:2, c('year', 'month'))

數據

 exampledf<-data.frame(Col1=c("Argentina","2005/12","2005/11",
          "Bolivia","2006/12"),stringsAsFactors=FALSE)

Answer 2

我的方法不是很優雅，但試圖逐步清理數據......

edf<-data.frame(c("Argentina","2005/12","2005/11","Bolivia","2006/12"),
                stringsAsFactors=F)

names(edf) <- "x"  # just to give a concise name

# flag if the row shows the month or not
edf$isMonth <- (regexpr("^[0-9]+/[0-9]+$", edf$x) > 0)

# expand the country 
# (i.e. if the row is month, reuse the country from the previous row)
edf$country <- edf$x
for (i in seq(2, nrow(edf))) {
  if (edf$isMonth[i]) {
    edf$country[i] <- edf$country[i-1]
  }
}

# now only the rows with month are relevant
edf <- edf[edf$isMonth,]

這會讓你：

     x isMonth   country
2005/12    TRUE Argentina
2005/11    TRUE Argentina
2006/12    TRUE   Bolivia

現在，剩下的任務是將年月變量分成年和月。 在您的示例代碼中， strsplit失敗，因為函數strsplit返回一個列表，並且mutate函數執行向量化操作而不是元素。

在這種特殊情況下，我發現stringr::str_match很有用。

library(stringr)
matched <- str_match(edf$x, "([0-9]+)/([0-9]+)")
edf$year <- matched[, 2]
edf$month <- matched[, 3]

結果是：

      x isMonth   country year month    
2005/12    TRUE Argentina 2005    12
2005/11    TRUE Argentina 2005    11
2006/12    TRUE   Bolivia 2006    12

Answer 3

另類戰略。 它並不簡潔，但很容易理解。

library(tidyr)
df <-data.frame(Country = c("Argentina","2005/12","2005/11","Bolivia","2006/12"),stringsAsFactors=F)
df$dates[grep("[0-9]",df$Country)] <- df$Country[grep("[0-9]",df$Country)]
df$Country[grep("[0-9]",df$Country)] <- NA

replace_with <- 1
for(i in 1:length(df$Country)) {
  if(!is.na(df$Country[i])) {
    replace_with <- df$Country[i]
    next
  } else {
    x[i] <- replace_with
  }
}
df$Country <- x
df <- separate(df, dates, c("Year", "Month"), "/")
df <- na.omit(df)
df
    Country Year Month
2 Argentina 2005    12
3 Argentina 2005    11
5   Bolivia 2006    12

Answer 4

這是另一種選擇。 您可以使用read.mtable從我的“SOfun”包連同cSplit從“splitstackshape”和rbindlist從“data.table”。

假設您至少加載了read.mtable函數（如果您不想安裝軟件包），方法是：

library(SOfun)
library(splitstackshape)

rbindlist(lapply(read.mtable(textConnection(exampledf[[1]]), "[a-z]"), 
                 cSplit, "V1", "/"), idcol = TRUE)
#          .id V1_1 V1_2
# 1: Argentina 2005   12
# 2: Argentina 2005   11
# 3:   Bolivia 2006   12

或者，您可以使用read.mtable本身拆分數據（盡管我懷疑cSplit可能更快）。 因此，方法是：

# library(SOfun)
# library(data.table)
rbindlist(read.mtable(textConnection(exampledf[[1]]), "[a-z]", 
                      sep = "/", col.names = c("Year", "Month")), idcol = TRUE)
#          .id Year Month
# 1: Argentina 2005    12
# 2: Argentina 2005    11
# 3:   Bolivia 2006    12

使用這種方法，您可以在流程中命名列。

根據值將dataframe列的內容拆分為不同的列

問題描述

4 個解決方案

解決方案1
4 已采納 2016-02-02 02:53:21

數據

解決方案2
4 2016-02-02 03:06:41

解決方案3
3 2016-02-02 03:02:56

解決方案4
3 2016-02-02 05:19:45

根據值將dataframe列的內容拆分為不同的列

問題描述

4 個解決方案

解決方案1 4 已采納 2016-02-02 02:53:21

數據

解決方案2 4 2016-02-02 03:06:41

解決方案3 3 2016-02-02 03:02:56

解決方案4 3 2016-02-02 05:19:45

解決方案1
4 已采納 2016-02-02 02:53:21

解決方案2
4 2016-02-02 03:06:41

解決方案3
3 2016-02-02 03:02:56

解決方案4
3 2016-02-02 05:19:45