R中的分裂列

Question

我面臨以下問題。 我有一個名為title 。

title列包含具有To kill a mockingbird (1960)等值的行。

所以基本上列的格式是[title] ([year]) 。 我需要的是兩列： title和year ， year沒有括號。

另一個問題是某些行包含標題，包括括號。 但基本上每行的最后6個字符都用括號括起來。

如何創建兩列， title和year ？

我有的是：

Books$title <- c("To kill a mockingbird (1960)", "Harry Potter and the order of the phoenix (2003)", "Of mice and men (something something) (1937)")

title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)

我需要的是：

Books$title <- c("To kill a mockingbird", "Harry Potter and the order of the phoenix", "Of mice and men (something something)")
Book$year <- c("1960", "2003", "1937")

title                                             year
To kill a mockingbird                             1960
Harry Potter and the order of the phoenix         2003
Of mice and men (something something)             1937

Answer 1

我們可以解決substr荷蘭國際集團在過去6個字符。

首先，我們重新創建您的data.frame ：

df <- read.table(h=T, sep="\n", stringsAsFactors = FALSE,
text="
Title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)")

然后我們創建一個新的。 第一列， Title是來自df$Title所有內容，但最后7個字符（我們還刪除了尾隨空格）。 第二列， Year是來自df$Title的最后6個字符，我們刪除任何空格，開始或結束括號。 （ gsub("[[:punct:]]", ... ）也可以。

data.frame(Title=substr(df$Title, 1, nchar(df$Title)-7),
           Year=gsub(" |\\(|\\)", "", substr(df$Title, nchar(df$Title)-6, nchar(df$Title))))


                                      Title Year
1                     To kill a mockingbird 1960
2 Harry Potter and the order of the phoenix 2003
3     Of mice and men (something something) 1937

這會解決你的問題嗎？

Answer 2

嘗試在循環中使用substrRight(df$Title, 6)來提取最后6個字符，以便使用括號將年份保存為新列

從R中的字符串中提取最后n個字符

Answer 3

與@Vincent Bonhomme相似：

我假設數據存在於某些文本文件中，我將其稱為so.dat從那里我將數據讀入data.frame，其中還包含兩列用於標題和年份的提取。 然后我使用substr()從最后的固定長度年份中分離標題，只留下（），因為OP顯然需要它們：

titles      <- data.frame( orig = readLines( "so.dat" ), 
               text = "", yr = "", stringsAsFactors = FALSE )
titles$text <- substring( titles[ , 1 ], 
               1, nchar( titles[ , 1 ] ) - 7 )
titles$yr   <- substring( titles[ , 1 ], 
               nchar( titles[ , 1 ] ) - 5, nchar( titles[ , 1 ] ) )

原始數據可以刪除或不刪除，這取決於進一步的需要。

R中的分裂列

問題描述

3 個解決方案

解決方案1
2 已采納 2017-10-01 12:52:35

解決方案2
1 2017-10-01 12:54:30

解決方案3
0 2017-10-01 13:15:39

R中的分裂列

問題描述

3 個解決方案

解決方案1 2 已采納 2017-10-01 12:52:35

解決方案2 1 2017-10-01 12:54:30

解決方案3 0 2017-10-01 13:15:39

解決方案1
2 已采納 2017-10-01 12:52:35

解決方案2
1 2017-10-01 12:54:30

解決方案3
0 2017-10-01 13:15:39