簡體   English   中英

R `$<-.data.frame`(`*tmp*`, "newCol", value = "categories") 中的錯誤:替換有 1 行,數據有 0

[英]R Error in `$<-.data.frame`(`*tmp*`, "newCol", value = "categories") : replacement has 1 row, data has 0

我正在嘗試使用具有四列的 dataframe 對雜亂的數據進行分類:

  1. “company_name”,這是我要分類的雜亂數據
  2. “類別”,這是我想把雜亂數據放入的類別
  3. “搜索”,就是我要在亂七八糟的數據中搜索的關鍵詞
  4. “company_type”,每行都有正確的公司類型
 "company_name" "categories" "search" "company_type" John landscaping Landscaping lawn NA Brother Lawn care Cleaning clean NA Top cleaning Painting paint NA

我希望我的最終結果如下所示:

 "company_name" "categories" "search" "company_type" John landscaping Landscaping lawn Landscaping Brother Lawn care Cleaning clean Landscaping Top cleaning Painting paint Cleaning

我在這里使用由 Chris Leonard 創建的 function: https://r-dir.com/blog/2015/01/quickly-categorize-messy-data.ZFC35FDC70D5FC69D239888A

這是代碼

df$company_type <- NA
  
categorizeDF <- function(df, searchColName, searchList, catList, newColName="Category") {
  catDF <- data.frame(matrix(ncol=ncol(df), nrow=0))
  colnames(catDF) <- paste0(names(df))
  df$sequence <- seq(nrow(df))
  for (i in seq_along(searchList)) {
    rownames(df) <- NULL
    index <- grep(searchList[i], df[,which(colnames(df) == searchColName)], ignore.case=TRUE)
    tempDF <- df[index,]
    tempDF$newCol <- catList[i]
    catDF <- rbind(catDF, tempDF)
    df <- df[-index,]
  }
  if (nrow(df) > 0) {
    df$newCol <- "OTHER"
    catDF <- rbind(catDF, df)
  }
  catDF <- catDF[order(catDF$sequence),]
  catDF$sequence <- NULL
  rownames(catDF) <- NULL
  catDF$newCol <- as.factor(catDF$newCol)
  colnames(catDF)[which(colnames(catDF) == "newCol")] <- newColName
  catDF
}

sorted <- categorizeDF(df, "company_name", "search", "categories", "company_type")

但是,我收到一個錯誤(回溯):

Error in `$<-.data.frame`(`*tmp*`, "newCol", value = "categories") : 
replacement has 1 row, data has 0
4.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
"replacement has %d rows, data has %d"), N, nrows), domain = NA)
3.
`$<-.data.frame`(`*tmp*`, "newCol", value = "categories")
2.
`$<-`(`*tmp*`, "newCol", value = "categories")
1.
categorizeDF(df, "company_name", "search", "categories", "company_type")

任何幫助,將不勝感激。

這是由不在任何雜亂數據列中的搜索字符串引起的

更新並且有效:

categorizeDF <- function(df, searchColName, searchList, catList, newColName="Category") {
  catDF <- data.frame(matrix(ncol=ncol(df), nrow=0))
  colnames(catDF) <- paste0(names(df))
  df$sequence <- seq(nrow(df))
  for (i in seq_along(searchList)) {
    rownames(df) <- NULL
    index <- grep(searchList[i], df[,which(colnames(df) == searchColName)], ignore.case=TRUE)
   
     if (identical(index,integer(0))){
      next
     }
    
    tempDF <- df[index,]
    tempDF$newCol <- catList[i]
    catDF <- rbind(catDF, tempDF)
    df <- df[-index,]
  }
  if (nrow(df) > 0) {
    df$newCol <- "OTHER"
    catDF <- rbind(catDF, df)
  }
  catDF <- catDF[order(catDF$sequence),]
  catDF$sequence <- NULL
  rownames(catDF) <- NULL
  catDF$newCol <- as.factor(catDF$newCol)
  colnames(catDF)[which(colnames(catDF) == "newCol")] <- newColName
  catDF
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM