data.frame 中的錯誤，未使用的參數

Question

我有這個數據框：

> head(merged.tables)
  Store DayOfWeek       Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType
1     1         5 2015-07-31  5263       555    1     1            0             1         c
2     1         6 2013-01-12  4952       646    1     0            0             0         c
3     1         5 2014-01-03  4190       552    1     0            0             1         c
4     1         3 2014-12-03  6454       695    1     1            0             0         c
5     1         3 2013-11-13  3310       464    1     0            0             0         c
6     1         7 2013-10-27     0         0    0     0            0             0         c
  Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2
1          a                1270                         9                     2008      0
2          a                1270                         9                     2008      0
3          a                1270                         9                     2008      0
4          a                1270                         9                     2008      0
5          a                1270                         9                     2008      0
6          a                1270                         9                     2008      0
  Promo2SinceWeek Promo2SinceYear PromoInterval
1              NA              NA              
2              NA              NA              
3              NA              NA              
4              NA              NA              
5              NA              NA              
6              NA              NA

然后我想提取一個數據框，顯示當Open 等於 1和StoreType時Sales向量的平均值。 我使用這個命令是因為它是我認為最致命的：

merged.tables[StateHoliday==1,mean(na.omit(Sales)),by=StoreType]

但我收到了這個錯誤：

[.data.frame(merged.tables, StateHoliday == 0, mean(na.omit(Sales)), 中的錯誤：未使用的參數（by = StoreType）

我搜索但我沒有得到這個錯誤的答案。 感謝您的幫助！

Answer 1

我有這樣的錯誤。

當我意識到時問題解決了：我的數據不是 data.table 格式。

示例：復制 <- data.table(data)

Answer 2

概述

有很多方法可以將函數應用於數據框中的一組值。 我介紹兩個：

使用dplyr包以回答您的問題的方式排列您的數據。
使用tapply() ，它對一組值執行函數。

可重現的例子

對於每種商店類型，我想要那些Open value 等於 1 的商店的平均銷售額。

我首先介紹dplyr方法，然后是tapply 。

注意：以下數據框僅從 OP 中發布的幾列中提取。

# install necessary package
install.packages( pkgs = "dplyr" )

# load necessary package
library( dplyr )

# create data frame
merged.tables <-
  data.frame(
    Store = c( 1, 1, 1, 2, 2, 2 )
    , StoreType = rep( x = c( "s", "m", "l" ) , times = 2)
    , Sales = round( x = runif( n = 6, min = 3000, max = 6000 ) , digits = 0 )
    , Open = c( 1, 1, 0, 0, 1, 1 )
    , stringsAsFactors = FALSE
  )

# view the data
merged.tables
#   Store StoreType Sales Open
# 1     1         s  4608    1
# 2     1         m  4017    1
# 3     1         l  4210    0
# 4     2         s  4833    0
# 5     2         m  3818    1
# 6     2         l  3090    1

# dplyr method
merged.tables %>%
  group_by( StoreType ) %>%
  filter( Open == 1 ) %>%
  summarise( AverageSales = mean( x = Sales , na.rm = TRUE ) )
# A tibble: 3 x 2
#   StoreType AverageSales
#   <chr>            <dbl>
# 1 l                 3090
# 2 m                 3918
# 3 s                 4608


# tapply method

# create the condition
# that 'Open' must be equal to one
Open.equals.one <- which( merged.tables$Open == 1 )

# apply the condition to
# both X and INDEX
tapply( X = merged.tables$Sales[ Open.equals.one ]
        , INDEX = merged.tables$StoreType[ Open.equals.one ]
        , FUN = mean
        , na.rm = TRUE # just in case your data does have NA values in the `Sales` column, this removes them from the calculation
)
# l      m      s 
# 3090.0 3917.5 4608.0 

# end of script #

資源

如果您以后需要更多條件，我鼓勵您查看其他相關的 SO 帖子，例如如何使用“OR”組合多個條件以對數據框進行子集化？ 為什么[比subset更好？ .

data.frame 中的錯誤，未使用的參數

問題描述

2 個解決方案

解決方案1
6 2020-02-10 11:42:09

解決方案2
1 已采納 2018-02-10 02:38:02

概述

可重現的例子

資源

data.frame 中的錯誤，未使用的參數

問題描述

2 個解決方案

解決方案1 6 2020-02-10 11:42:09

解決方案2 1 已采納 2018-02-10 02:38:02

概述

可重現的例子

資源

解決方案1
6 2020-02-10 11:42:09

解決方案2
1 已采納 2018-02-10 02:38:02