在基於R中的列值對數據幀進行子集化時發布

Question

在R對data frame進行子集化時，我遇到了一個問題。 數據框是att2 ，它有一個基於我想要subset的列filter_name 。 此列的unique值如下所示。

unique(att2[["filter_name"]])
# [1] title             Type        Operating_System         Occasion           Brand
148 Levels: Accessories Age Antennae Art_Style Aspect_ratio ... Zoom

這表明Brand是filter_name列的值。 但是當我使用下面的代碼對幀進行子集時，它會給出0行，如下所示。

att3 <- subset(att2, filter_name == 'Brand')
> att3
[1] a      b         c  filter_name
<0 rows> (or 0-length row.names)

我無法找出原因。 有沒有人遇到過這種問題？

Answer 1

我們所能做的就是猜測問題的根源可能是什么。

這是我最好的猜測：你的“filter_name”列中有空格，因此在剝離空格之前，你不應該真正尋找“Brand”。

這是一個最小的例子， 如果我的猜測正確 ，它會重現你的問題：

首先，一些樣本數據：

mydf <- data.frame(Param =  c("   Brand   ", "Operating System", 
                              "Type ", "   Brand   ", "Type ", 
                              "Type ", "   Brand   ", "Type ", 
                              "   Brand   "), Value = 1:9)
unique(mydf[["Param"]])
# [1]    Brand         Operating System Type            
# Levels:    Brand    Operating System Type 

subset(mydf, Param == "Brand")
# [1] Param Value
# <0 rows> (or 0-length row.names)

使用帶有quote = TRUE參數的print來查看data.frame的空格：

print(mydf, quote = TRUE)
#                Param Value
# 1      "   Brand   "   "1"
# 2 "Operating System"   "2"
# 3            "Type "   "3"
# 4      "   Brand   "   "4"
# 5            "Type "   "5"
# 6            "Type "   "6"
# 7      "   Brand   "   "7"
# 8            "Type "   "8"
# 9      "   Brand   "   "9"

如果這恰好是你的問題，那么快速的gsub應該修復它：

mydf$Param <- gsub("^\\s+|\\s+$", "", mydf$Param)
unique(mydf[["Param"]])
# [1] "Brand"            "Operating System" "Type"  

subset(mydf, Param == "Brand")
#   Param Value
# 1 Brand     1
# 4 Brand     4
# 7 Brand     7
# 9 Brand     9

您可能還想查看read.table和family中的strip.white參數，默認為FALSE 。 嘗試使用strip.white = TRUE重新讀取數據，然后嘗試子集化。

Answer 2

首先，你應該真正閱讀這篇關於如何提出好問題的stackoverflow文章。

對於你的問題，這樣的事情，（當你沒有發布一個可重復的例子時很難，正如Arun也指出的那樣）

 att2 <- (data.frame(v=rnorm(10), filter_name=c('Brand','Not Brand')))

 att2[att2$filter_name == 'Brand', ]
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

 subset(att2, filter_name == 'Brand')
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

這里有很多關於子設置的內容。

Answer 3

使用stringr包，你可以做類似的事情

   dat$filter_name_trim <- str_trim(dat$filter_name)
   att3 <- subset(att2, filter_name_trim == 'Brand')

在基於R中的列值對數據幀進行子集化時發布

問題描述

3 個解決方案

解決方案1
2 已采納 2013-02-05 07:44:15

解決方案2
0 2013-02-05 07:26:04

解決方案3
0 2013-02-05 07:50:33

在基於R中的列值對數據幀進行子集化時發布

問題描述

3 個解決方案

解決方案1 2 已采納 2013-02-05 07:44:15

解決方案2 0 2013-02-05 07:26:04

解決方案3 0 2013-02-05 07:50:33

解決方案1
2 已采納 2013-02-05 07:44:15

解決方案2
0 2013-02-05 07:26:04

解決方案3
0 2013-02-05 07:50:33