[英]lapply() and spline() on two data frames in R , No Merging
I have two data frames (df, df5) with shared factor level ("Auction_ID"). 我有两个具有共享因子级别(“ Auction_ID”)的数据帧(df,df5)。 so df has num.bidders and res.bid and Auction_ID. 因此df有num.bidders和res.bid和Auction_ID。 df5, has bid.points, Auction_ID. df5,具有bid.points,拍卖ID。
I used smooth.splines() function to get spline estimates, and I saved it as new column in df (I am not sure if I should save it in df5) 我使用smooth.splines()函数获取样条估计,并将其另存为df中的新列(不确定是否应将其保存在df5中)
spline <- smooth.spline(df$c_bidders,df$res.bid)
the question is how to use predict() function on df$spline1 and df5$bid.points for each level. 问题是如何在每个级别的df $ spline1和df5 $ bid.points上使用predict()函数。 I tried to use lapply and send df,df5 as input data for function, but seems I can't do it. 我尝试使用lapply并将df,df5发送为函数的输入数据,但似乎无法做到这一点。 like: 喜欢:
lapply(df,df5, function(t,t1)
{
tt<-predict(t$spline,t1$bid.points,deriv=0)$y
return(tt)}
)
I dont know if I introduce a list variable, will this help? 我不知道是否引入列表变量,这会有所帮助吗?
if I use merge(df,df5,by="Auction_ID") then I am ending up very large data frame: 如果我使用merge(df,df5,by =“ Auction_ID”),那么我将结束非常大的数据帧:
str(df1):
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3967 obs. of 17 variables:
str(df5)
'data.frame': 18338 obs. of 2 variables:
x <- merge(df5, df1, by = "Auction_ID")
str(x)
'data.frame': 501367 obs. of 19 variables:
( merge() with "all" options are already tried. like all.y = TRUE ... gives the same number of obs. which is not good for my purpose. (已经尝试过带有“ all”选项的merge()。像all.y = TRUE ...给出相同数量的obs。这对我的目的不利。
Is the issue that you don't want to deal with the large df with 50k rows? 您是否不想处理具有5万行的大型df的问题?
Maybe a merge (aka join) isn't what you need. 也许合并(又称联接)不是您所需要的。 Perhaps you just need to use the "match" function to essentially perform a vlookup and match each value of df$spline1 to each corresponding value of df5$bid.points (based on auction ID). 也许您只需要使用“ match”函数本质上执行vlookup并将df $ spline1的每个值与df5 $ bid.points的每个对应值进行匹配(基于拍卖ID)。
See if this works for your purposes: 看看这是否适合您的目的:
# assuming df5 is the target df:
df5$spline1 <- df$spline1[match(df$Auction_ID,df5$Auction_ID)]
## OR
# assuming df is the target df:
df$bid.points <- df5$bid.points[match(df$Auction_ID,df5$Auction_ID)]
Consider using Map
to pass both dataframes which returns a list of values returned from predict()
: 考虑使用Map
传递这两个数据帧,这两个数据帧返回从predict()
返回的值的列表:
List return 清单返回
Map(function(t, t1) predict(t$spline, t1$bid.points,deriv=0)$y, df, df5)
Above would be equivalent to passing the second dataframe as a third argument in lapply()
: 以上等同于将第二个数据帧作为第三个参数传递给lapply()
:
lapply(df, function(t,t1) {
predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)
Matrix Return 矩阵回报
Alternatively, using sapply()
which returns a matrix: 或者,使用sapply()
返回一个矩阵:
sapply(df, function(t,t1) {
predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)
Or mapply()
the base function behind Map()
(its non-simplified wrapper) 或mapply()
Map()
背后的基本函数(非简化包装器)
mapply(function(t,t1) predict(t$spline, t1$bid.points, deriv=0)$y, df, df5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.