[英]Using anonymous functions with lapply/sapply in R?
I'm trying to use sapply to take each item in a list (eg "Golf","Malibu","Corvette") and create a new list with the highest value in the dataframe that list was split from (eg cars$sale_price). 我正在尝试使用sapply来获取列表中的每个项目(例如“高尔夫”,“马里布”,“克尔维特”),并创建一个新的列表,该列表具有从中拆分列表的数据框中的最大值(例如,cars $ sale_price )。 I'm trying to use an anonymous function to do so, but I can't get that function to work. 我正在尝试使用匿名函数来执行此操作,但是我无法使该函数正常工作。
The basic issue here is that I'm not very good at writing functions. 这里的基本问题是我不太擅长编写函数。
First, I took the original dataframe cars and used split
to create a list of unique car names - I called this car_names . 首先,我使用原始的数据框汽车,并使用split
创建了一个唯一的汽车名称列表-我称为car_names 。
Now, I'm trying to create a new list, using sapply, of the highest sale price of each type of car in the list. 现在,我正在尝试使用sapply创建一个新列表,其中列出列表中每种汽车的最高售价。 I'm sure I'm starting the thing correctly ... 我确定我正确地开始了...
price_list <- sapply(car_names,
... but I can't for the life of me get an anonymous function to simply apply max
to all instances of each car name in cars$sale price. ...但是我无法终生获得一个匿名函数来简单地将max
应用于cars $售价中每个汽车名称的所有实例。
I've tried a bunch of stuff, all of which has returned an error. 我尝试了很多东西,所有东西都返回了错误。 Here's an example: 这是一个例子:
price_list <- sapply(car_names, function(x) {
max(cars$saleprice[x])
})
Which returns: 哪个返回:
Error in h115$nominate_dim1[x] : invalid subscript type 'list'
I'm sure this is trivially simply for even moderate experienced programmers, but I'm ... not one of those! 我敢肯定,即使是经验丰富的程序员,这也不是一件容易的事,但是我...不是其中之一! I suspect that I'm pointing to something incorrectly, but I can't get past it. 我怀疑我指的是不正确的东西,但我无法超越它。 Any ideas? 有任何想法吗?
Edit: Here's a reproducible example. 编辑:这是一个可复制的示例。
First, the "source" dataframe: 首先,“源”数据框:
cars1 <- data.frame("car_names" = c("Corvette", "Corvette", "Corvette", "Golf", "Golf", "Golf", "Malibu", "Malibu", "Malibu"),"saleprice" = c(32000,45000,72000,7500,16000,22000,33000,21000,26500))
Next, splitting the df by car_names: 接下来,用car_names分割df:
cars1_split <- split(cars1, cars1$car_names)
Now, attempting to pass max
to sapply
and getting an error: 现在,尝试将max
传递给sapply
并收到错误消息:
maxes <- sapply(cars1_split, function(x){
max(cars1$saleprice[x])
})
Hopefully this give you guys something to work with! 希望这能给你们一些合作的机会!
You have a few options here, let's start with aggregate
- not what you asked for but I want to keep your attention high ;) 您在这里有几个选择,让我们从aggregate
开始-不是您想要的,但我想引起您的高度关注;)
aggregate(saleprice ~ car_names, cars1, max)
# car_names saleprice
#1 Corvette 72000
#2 Golf 22000
#3 Malibu 33000
Returns a data.frame (which you can easily split
if you need a list) 返回一个data.frame(如果需要列表,可以轻松split
)
aggregate
is similar to tapply
coming next aggregate
类似于接下来要tapply
tapply(cars1$saleprice, cars1$car_names, FUN = max)
#Corvette Golf Malibu
# 72000 22000 33000
Or try by
and which.max
或尝试by
which.max
by(cars1, cars1$car_names, FUN = function(x) x[which.max(x$saleprice), ])
#cars1$car_names: Corvette
# car_names saleprice
#3 Corvette 72000
#-------------------------------
#cars1$car_names: Golf
# car_names saleprice
#6 Golf 22000
#-------------------------------
#cars1$car_names: Malibu
# car_names saleprice
#7 Malibu 33000
Finally, you can use also lapply
and split
(for which by
is somewhat shorthand) 最后,你也可以使用lapply
和split
(针对by
有些简写)
lapply(split(cars1, cars1$car_names), function(x) x[which.max(x$saleprice), ])
#$Corvette
# car_names saleprice
#3 Corvette 72000
#$Golf
# car_names saleprice
#6 Golf 22000
#$Malibu
# car_names saleprice
#7 Malibu 33000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.