简体   繁体   English

R中的两个数据帧上的lapply()和spline(),不合并

[英]lapply() and spline() on two data frames in R , No Merging

I have two data frames (df, df5) with shared factor level ("Auction_ID"). 我有两个具有共享因子级别(“ Auction_ID”)的数据帧(df,df5)。 so df has num.bidders and res.bid and Auction_ID. 因此df有num.bidders和res.bid和Auction_ID。 df5, has bid.points, Auction_ID. df5,具有bid.points,拍卖ID。

I used smooth.splines() function to get spline estimates, and I saved it as new column in df (I am not sure if I should save it in df5) 我使用smooth.splines()函数获取样条估计,并将其另存为df中的新列(不确定是否应将其保存在df5中)

    spline  <- smooth.spline(df$c_bidders,df$res.bid)

the question is how to use predict() function on df$spline1 and df5$bid.points for each level. 问题是如何在每个级别的df $ spline1和df5 $ bid.points上使用predict()函数。 I tried to use lapply and send df,df5 as input data for function, but seems I can't do it. 我尝试使用lapply并将df,df5发送为函数的输入数据,但似乎无法做到这一点。 like: 喜欢:

 lapply(df,df5, function(t,t1)
   {
    tt<-predict(t$spline,t1$bid.points,deriv=0)$y 
   return(tt)}
    )

I dont know if I introduce a list variable, will this help? 我不知道是否引入列表变量,这会有所帮助吗?

if I use merge(df,df5,by="Auction_ID") then I am ending up very large data frame: 如果我使用merge(df,df5,by =“ Auction_ID”),那么我将结束非常大的数据帧:

   str(df1):
   Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    3967 obs. of  17 variables:

   str(df5)
   'data.frame':    18338 obs. of  2 variables:

    x <- merge(df5, df1, by = "Auction_ID")
    str(x)
    'data.frame':   501367 obs. of  19 variables:

( merge() with "all" options are already tried. like all.y = TRUE ... gives the same number of obs. which is not good for my purpose. (已经尝试过带有“ all”选项的merge()。像all.y = TRUE ...给出相同数量的obs。这对我的目的不利。

Is the issue that you don't want to deal with the large df with 50k rows? 您是否不想处理具有5万行的大型df的问题?

Maybe a merge (aka join) isn't what you need. 也许合并(又称联接)不是您所需要的。 Perhaps you just need to use the "match" function to essentially perform a vlookup and match each value of df$spline1 to each corresponding value of df5$bid.points (based on auction ID). 也许您只需要使用“ match”函数本质上执行vlookup并将df $ spline1的每个值与df5 $ bid.points的每个对应值进行匹配(基于拍卖ID)。

See if this works for your purposes: 看看这是否适合您的目的:

# assuming df5 is the target df:
df5$spline1 <- df$spline1[match(df$Auction_ID,df5$Auction_ID)]

## OR

# assuming df is the target df:
df$bid.points <- df5$bid.points[match(df$Auction_ID,df5$Auction_ID)]

Consider using Map to pass both dataframes which returns a list of values returned from predict() : 考虑使用Map传递这两个数据帧,这两个数据帧返回从predict()返回的值的列表:

List return 清单返回

Map(function(t, t1) predict(t$spline, t1$bid.points,deriv=0)$y, df, df5)

Above would be equivalent to passing the second dataframe as a third argument in lapply() : 以上等同于将第二个数据帧作为第三个参数传递给lapply()

lapply(df, function(t,t1) { 
     predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)

Matrix Return 矩阵回报

Alternatively, using sapply() which returns a matrix: 或者,使用sapply()返回一个矩阵:

sapply(df, function(t,t1) { 
     predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)

Or mapply() the base function behind Map() (its non-simplified wrapper) mapply() Map()背后的基本函数(非简化包装器)

mapply(function(t,t1) predict(t$spline, t1$bid.points, deriv=0)$y, df, df5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM