[英]Error in data.frame , unused argument
我有這個數據框:
> head(merged.tables)
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType
1 1 5 2015-07-31 5263 555 1 1 0 1 c
2 1 6 2013-01-12 4952 646 1 0 0 0 c
3 1 5 2014-01-03 4190 552 1 0 0 1 c
4 1 3 2014-12-03 6454 695 1 1 0 0 c
5 1 3 2013-11-13 3310 464 1 0 0 0 c
6 1 7 2013-10-27 0 0 0 0 0 0 c
Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2
1 a 1270 9 2008 0
2 a 1270 9 2008 0
3 a 1270 9 2008 0
4 a 1270 9 2008 0
5 a 1270 9 2008 0
6 a 1270 9 2008 0
Promo2SinceWeek Promo2SinceYear PromoInterval
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
然后我想提取一個數據框,顯示當Open 等於 1和StoreType時Sales向量的平均值。 我使用這個命令是因為它是我認為最致命的:
merged.tables[StateHoliday==1,mean(na.omit(Sales)),by=StoreType]
但我收到了這個錯誤:
[.data.frame(merged.tables, StateHoliday == 0, mean(na.omit(Sales)), 中的錯誤:未使用的參數(by = StoreType)
我搜索但我沒有得到這個錯誤的答案。 感謝您的幫助!
我有這樣的錯誤。
當我意識到時問題解決了:我的數據不是 data.table 格式。
示例:復制 <- data.table(data)
有很多方法可以將函數應用於數據框中的一組值。 我介紹兩個:
對於每種商店類型,我想要那些Open
value 等於 1 的商店的平均銷售額。
注意:以下數據框僅從 OP 中發布的幾列中提取。
# install necessary package
install.packages( pkgs = "dplyr" )
# load necessary package
library( dplyr )
# create data frame
merged.tables <-
data.frame(
Store = c( 1, 1, 1, 2, 2, 2 )
, StoreType = rep( x = c( "s", "m", "l" ) , times = 2)
, Sales = round( x = runif( n = 6, min = 3000, max = 6000 ) , digits = 0 )
, Open = c( 1, 1, 0, 0, 1, 1 )
, stringsAsFactors = FALSE
)
# view the data
merged.tables
# Store StoreType Sales Open
# 1 1 s 4608 1
# 2 1 m 4017 1
# 3 1 l 4210 0
# 4 2 s 4833 0
# 5 2 m 3818 1
# 6 2 l 3090 1
# dplyr method
merged.tables %>%
group_by( StoreType ) %>%
filter( Open == 1 ) %>%
summarise( AverageSales = mean( x = Sales , na.rm = TRUE ) )
# A tibble: 3 x 2
# StoreType AverageSales
# <chr> <dbl>
# 1 l 3090
# 2 m 3918
# 3 s 4608
# tapply method
# create the condition
# that 'Open' must be equal to one
Open.equals.one <- which( merged.tables$Open == 1 )
# apply the condition to
# both X and INDEX
tapply( X = merged.tables$Sales[ Open.equals.one ]
, INDEX = merged.tables$StoreType[ Open.equals.one ]
, FUN = mean
, na.rm = TRUE # just in case your data does have NA values in the `Sales` column, this removes them from the calculation
)
# l m s
# 3090.0 3917.5 4608.0
# end of script #
如果您以后需要更多條件,我鼓勵您查看其他相關的 SO 帖子,例如如何使用“OR”組合多個條件以對數據框進行子集化? 為什么[
比subset
更好? .
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.