简体   繁体   English

在R中对lapply / sapply使用匿名函数?

[英]Using anonymous functions with lapply/sapply in R?

I'm trying to use sapply to take each item in a list (eg "Golf","Malibu","Corvette") and create a new list with the highest value in the dataframe that list was split from (eg cars$sale_price). 我正在尝试使用sapply来获取列表中的每个项目(例如“高尔夫”,“马里布”,“克尔维特”),并创建一个新的列表,该列表具有从中拆分列表的数据框中的最大值(例如,cars $ sale_price )。 I'm trying to use an anonymous function to do so, but I can't get that function to work. 我正在尝试使用匿名函数来执行此操作,但是我无法使该函数正常工作。

The basic issue here is that I'm not very good at writing functions. 这里的基本问题是我不太擅长编写函数。

First, I took the original dataframe cars and used split to create a list of unique car names - I called this car_names . 首先,我使用原始的数据框汽车,并使用split创建了一个唯一的汽车名称列表-我称为car_names

Now, I'm trying to create a new list, using sapply, of the highest sale price of each type of car in the list. 现在,我正在尝试使用sapply创建一个新列表,其中列出列表中每种汽车的最高售价。 I'm sure I'm starting the thing correctly ... 我确定我正确地开始了...

price_list <- sapply(car_names, 

... but I can't for the life of me get an anonymous function to simply apply max to all instances of each car name in cars$sale price. ...但是我无法终生获得一个匿名函数来简单地将max应用于cars $售价中每个汽车名称的所有实例。

I've tried a bunch of stuff, all of which has returned an error. 我尝试了很多东西,所有东西都返回了错误。 Here's an example: 这是一个例子:

price_list <- sapply(car_names, function(x) {
    max(cars$saleprice[x])
})

Which returns: 哪个返回:

Error in h115$nominate_dim1[x] : invalid subscript type 'list'

I'm sure this is trivially simply for even moderate experienced programmers, but I'm ... not one of those! 我敢肯定,即使是经验丰富的程序员,这也不是一件容易的事,但是我...不是其中之一! I suspect that I'm pointing to something incorrectly, but I can't get past it. 我怀疑我指的是不正确的东西,但我无法超越它。 Any ideas? 有任何想法吗?


Edit: Here's a reproducible example. 编辑:这是一个可复制的示例。

First, the "source" dataframe: 首先,“源”数据框:

cars1 <- data.frame("car_names" = c("Corvette", "Corvette", "Corvette", "Golf", "Golf", "Golf", "Malibu", "Malibu", "Malibu"),"saleprice" = c(32000,45000,72000,7500,16000,22000,33000,21000,26500))

Next, splitting the df by car_names: 接下来,用car_names分割df:

cars1_split <- split(cars1, cars1$car_names)

Now, attempting to pass max to sapply and getting an error: 现在,尝试将max传递给sapply并收到错误消息:

maxes <- sapply(cars1_split, function(x){
  max(cars1$saleprice[x])
})

Hopefully this give you guys something to work with! 希望这能给你们一些合作的机会!

You have a few options here, let's start with aggregate - not what you asked for but I want to keep your attention high ;) 您在这里有几个选择,让我们从aggregate开始-不是您想要的,但我想引起您的高度关注;)

aggregate(saleprice ~ car_names, cars1, max)
#  car_names saleprice
#1  Corvette     72000
#2      Golf     22000
#3    Malibu     33000

Returns a data.frame (which you can easily split if you need a list) 返回一个data.frame(如果需要列表,可以轻松split

aggregate is similar to tapply coming next aggregate类似于接下来要tapply

tapply(cars1$saleprice, cars1$car_names, FUN = max)
#Corvette     Golf   Malibu 
#   72000    22000    33000

Or try by and which.max 或尝试by which.max

by(cars1, cars1$car_names, FUN = function(x) x[which.max(x$saleprice), ])
#cars1$car_names: Corvette
#  car_names saleprice
#3  Corvette     72000
#-------------------------------
#cars1$car_names: Golf
#  car_names saleprice
#6      Golf     22000
#-------------------------------
#cars1$car_names: Malibu
#  car_names saleprice
#7    Malibu     33000

Finally, you can use also lapply and split (for which by is somewhat shorthand) 最后,你也可以使用lapplysplit (针对by有些简写)

lapply(split(cars1, cars1$car_names), function(x) x[which.max(x$saleprice), ])
#$Corvette
#  car_names saleprice
#3  Corvette     72000

#$Golf
#  car_names saleprice
#6      Golf     22000

#$Malibu
#  car_names saleprice
#7    Malibu     33000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM