根據兩個字符列之間的差異創建R data.frame列

Question

我有一個data.frame，df，其中我有2列，一列是歌曲的標題，另一列是合並的標題和藝術家。 我希望創建一個單獨的藝術家領域。 前三行顯示在這里

title                               titleArtist
I'll Never Smile Again  I'll Never Smile Again TOMMY DORSEY & HIS ORCHESTRA / FRANK SINATRA & PIED PIPERS
Imagination         Imagination GLENN MILLER & HIS ORCHESTRA / RAY EBERLE
The Breeze And I    The Breeze And I JIMMY DORSEY & HIS ORCHESTRA / BOB EBERLY

此代碼對此數據集沒有任何問題

library(stringr)
library(dplyr)

 df %>% 
 head(3) %>% 
 mutate(artist=str_to_title(str_trim(str_replace(titleArtist,title,"")))) %>% 
 select(artist,title)

 artist                                                         title
1 Tommy Dorsey & His Orchestra / Frank Sinatra & Pied Pipers I'll Never Smile Again
2                  Jimmy Dorsey & His Orchestra / Bob Eberly       The Breeze And I
 3                  Glenn Miller & His Orchestra / Ray Eberle            Imagination

但是，當我將它應用於數千行時，我得到了錯誤

Error: Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)

#or for part of the mutation

df$artist <-str_replace(df$titleArtist,df$title,"")

Error in stri_replace_first_regex(string, pattern, replacement, opts_regex =    attr(pattern,  : 
 Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)

我已從列中刪除所有括號，代碼似乎在我收到錯誤之前工作了一段時間

Error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)

是另一個可能導致問題的特殊角色還是其他東西？

TIA

Answer 1

您的一般問題是str_replace將您的artist值視為正則表達式，因此由於括號之外的特殊字符而存在許多潛在錯誤。 stringr包裝和簡化的stringi庫允許更細粒度的控件，包括將參數視為固定字符串而不是正則表達式。 我沒有您的原始數據，但是當我在以下位置拋出一些導致錯誤的字符時，這是有效的：

library(dplyr)
library(stringi)


df = data_frame(title = c("I'll Never Smile Again (",  "Imagination.*", "The Breeze And I(?>="),
           titleArtist = c("I'll Never Smile Again ( TOMMY DORSEY & HIS ORCHESTRA / FRANK SINATRA & PIED PIPERS",
                            "Imagination.* GLENN MILLER & HIS ORCHESTRA / RAY EBERLE",
                            "The Breeze And I(?>= JIMMY DORSEY & HIS ORCHESTRA / BOB EBERLY"))

df %>%
  mutate(artist=stri_trans_totitle(stri_trim(stri_replace_first_fixed(titleArtist,title,"")))) %>% 
  select(artist,title)

結果：

Source: local data frame [3 x 2]

artist                     title
(chr)                     (chr)
1 Tommy Dorsey & His Orchestra / Frank Sinatra & Pied Pipers I'll Never Smile Again (
2                  Glenn Miller & His Orchestra / Ray Eberle             Imagination.*
3                  Jimmy Dorsey & His Orchestra / Bob Eberly      The Breeze And I(?>=

Answer 2

 df <- data.frame(ID=11:13, T_A=c('a/b','b/c','x/y'))  # T_A Title/Artist 
   ID T_A
 1 11 a/b
 2 12 b/c
 3 13 x/y

 # Title Artist are separated by /
 > within(df, T_A<-data.frame(do.call('rbind', strsplit(as.character(T_A), '/', fixed=TRUE))))
  ID T_A.X1 T_A.X2
 1 11      a      b
 2 12      b      c
 3 13      x      y

根據兩個字符列之間的差異創建R data.frame列

問題描述

2 個解決方案

解決方案1
2 已采納 2016-06-01 17:00:06

解決方案2
0 2016-05-29 20:37:45

根據兩個字符列之間的差異創建R data.frame列

問題描述

2 個解決方案

解決方案1 2 已采納 2016-06-01 17:00:06

解決方案2 0 2016-05-29 20:37:45

解決方案1
2 已采納 2016-06-01 17:00:06

解決方案2
0 2016-05-29 20:37:45