如何獲取 R 創建新列（從舊列中的字符串左側部分命名），然后將舊列中的字符串右側部分放入新列中

Question

給定現有的 dataframe 包含如下所示的字符列 (oldColumn1)，我想讓 R 在同一數據框中自動創建一個新列，從字符串的左側部分命名（例如 COLOR）。

然后對於每一行，將出現在“:”之后的字符串內容的右側部分（例如 RED、BLUE、ETC）放入名為“COLOR”的新列中。

有許多舊列（oldColumn1、oldColumn2 等）需要像這樣拆分出來，因此手動執行此操作是不切實際的。 提前感謝您提供的任何幫助。

# Here is an example of 3 oldColumns that already exist in dataframe.
# There are thousands of these columns, need to auto create a new
# column for each one as described.
# Maybe hoping to have the oldColumn names in a vector, to then pass
# to a function that creates a new column for each oldColumn. 

oldColumn1 <- c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE')
oldColumn2 <- c('SIZE: LARGE', 'SIZE: MEDIUM','SIZE: XLARGE','SIZE: MEDIUM','SIZE: SMALL')
oldColumn3 <- c('DESIGNSTYLE: STYLED', 'DESIGNSTYLE: ORIGINAL MAKER', 'DESIGNSTYLE: COUTURE','DESIGNSTYLE: COUTURE','DESIGNSTYLE: STYLED')
COLOR <- c('RED', 'RED', 'BLUE', 'GREEN', 'BLUE')
SIZE <- c('LARGE', 'MEDIUM', 'XLARGE', 'MEDIUM', 'SMALL')
DESIGNSTYLE <- c('STYLED', 'ORIGINAL MAKER', 'COUTURE', 'COUTURE', 'STYLED')
dat <- data.frame(oldColumn1, oldColumn2, oldColumn3, COLOR, SIZE, DESIGNSTYLE)
dat

Answer 1

您可以使用$創建一個新列，然后使用gsub()從目標列中刪除COLOR: 。

yourdf$COLOR <- gsub("COLOR: ", "", yourdf$oldColumn1)

如果您還想刪除舊列：

yourdf$oldColumn1 <- NULL

編輯

如果您有很多列，您可以將gsub function 應用於所有目標列。 如果您的目標列具有通用名稱模式，例如示例中的oldColumn ，您可以通過使用grep識別該模式來對數據框進行子集化。 之后，您可以將編輯的列重命名為COLOR1 、 COLOR2等。

以下是完整的步驟：

# Remove "COLOR: " from the targeted columns
colname_pattern <- grep("oldColumn", colnames(yourdf))
yourdf[, colname_pattern] <- apply(yourdf[, colname_pattern], 2, 
                                   gsub, pattern = "COLOR: ", 
                                   replacement = "")
# Rename the edited columns
index <- seq_along(colname_pattern)
newnames <- paste0("COLOR", index)
colnames(yourdf[, colname_pattern]) <- newnames

Answer 2

從...開始

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))

天真的方法是

data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
#   COLOR
# 1   RED
# 2   RED
# 3  BLUE
# 4 GREEN
# 5  BLUE

但我假設您有更一般的需求。 讓我們假設你還有一些東西要從中解析出來，比如

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
#     oldColumn1
# 1   COLOR: RED
# 2   COLOR: RED
# 3  COLOR: BLUE
# 4 COLOR: GREEN
# 5  COLOR: BLUE
# 6      SIZE: 1
# 7      SIZE: 3
# 8      SIZE: 5

然后我們可以將其概括為

tmp <- strcapture("(.*)\\s*:\\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
#   COLOR SIZE
# 1   RED    1
# 2   RED    3
# 3  BLUE    5
# 4 GREEN <NA>
# 5  BLUE <NA>

--

編輯：替代更新數據：

do.call(cbind, lapply(dat, function(X) {
  nm <- sub(":.*", "", X[1])
  out <- data.frame(trimws(sub(".*:", "", X)))
  names(out) <- nm
  out
}))
#   COLOR   SIZE    DESIGNSTYLE
# 1   RED  LARGE         STYLED
# 2   RED MEDIUM ORIGINAL MAKER
# 3  BLUE XLARGE        COUTURE
# 4 GREEN MEDIUM        COUTURE
# 5  BLUE  SMALL         STYLED

如何獲取 R 創建新列（從舊列中的字符串左側部分命名），然后將舊列中的字符串右側部分放入新列中

問題描述

2 個解決方案

解決方案1
2 2022-04-12 21:24:45

解決方案2
1 已采納 2022-04-12 21:27:37

如何獲取 R 創建新列（從舊列中的字符串左側部分命名），然后將舊列中的字符串右側部分放入新列中

問題描述

2 個解決方案

解決方案1 2 2022-04-12 21:24:45

解決方案2 1 已采納 2022-04-12 21:27:37

解決方案1
2 2022-04-12 21:24:45

解決方案2
1 已采納 2022-04-12 21:27:37