简体   繁体   English

使用for循环获取R中的列表数据帧

[英]Using for loop to get a list data frames in R

splitted is a list of data frames coming from a split() on the main data frame. splitted是来自主数据帧上的split()的数据帧列表。

After splitting, I'm applying a function to every data frame in the splitted list. 分开后,我在分裂应用列表中的一个功能,每一个数据帧。

Here the function: 这里的功能:

getCustomer <- function(df, numberOfProducts = 3){

Gender <- unique(df$gender)
Segment <- unique(df$Segment)
Net_Discount <- sum(df$Discount * df$Sales)
Number_of_Discounts <- sum(df$Discount>0)
Customer.ID <- unique(df$Customer.ID)
Sales <- sum(df$Sales)
Profit <- sum(df$Profit)
lat <- mean(df$lat)
lon <- mean(df$lon)

productsData <- df %>% arrange(Order.Date) %>% top_n(n =numberOfProducts)

Products <- 0 
Products_Category <- 0
Products_Order_Date <- 0

for (j in 1:numberOfProducts){ 

Products[j] <- productsData %>% select(Product.ID) %>% filter(row_number()==j)
Products_Category[j] <- productsData %>% select(Category) %>% filter(row_number()==j)
  Products_Order_Date[j] <- productsData %>% select(Order.Date) %>% filte(row_number()==j)

  names(Products)[j]<-paste("Product",j)
  names(Products_Category)[j]<-paste("Category Product",j)
  names(Products_Order_Date)[j]<-paste("Order Date Product",j)

  }


  output <- data.frame(Customer.ID, Gender,Segment, Net_Discount, Number_of_Discounts, Sales, Profit, 
                   Products, Products_Category, Products_Order_Date, lon,lat)

return(output[1,])
}

I get the right answer for any element of splitted 对于拆分的任何元素,我都能得到正确的答案

getCustomer(splitted[[687]],2)

I can even do well with 我什至可以做得很好

customer <- list()
customer[[1]]<- getCustomer(splitted[[1]],2)
customer[[2]]<- getCustomer(splitted[[2]],2)
.
.
.
customer[[1576]]<- getCustomer(splitted[[1576]],2)

That is, I can effectively build the whole customer list by assigning element by element. 也就是说,我可以通过逐元素分配来有效地构建整个客户列表。

However, I certainly don't have time for that (1576 single line data frames to assign to the customer list), so I'm trying: 但是,我当然没有时间(将1576个单行数据帧分配给客户列表),所以我正在尝试:

customer <- list()

for (i in 1:length(splitted)){

  customer[[i]]<-getCustomer(splitted[[i]],2)

}

After running this last chunk of code, I get: 运行最后的代码块后,我得到:

Error in data.frame(Customer.ID, Gender,  Segment, Net_Discount, Number_of_Discounts, : arguments imply differing number of rows: 0, 1

I can't understand this error, since I can build the customer list element by element at a time. 我无法理解此错误,因为我可以一次构建一个元素的客户列表。

Would apreciate your help. 会很感谢您的帮助。

Solution

Editing this question to let you know the problem was indeed that some data frames in splitted had no rows. 编辑此问题以使您知道问题确实是某些拆分的数据帧没有行。 So I removed them (only 3). 所以我删除了它们(只有3个)。

for (i in 1:length(splitted)){
l[i]<-nrow(splitted[[i]])  
}

indices<- which(l==0)

splitted<-splitted[-indices]

Just had to delete 3 samples. 只需删除3个样本。 Got no error this time. 这次没有任何错误。 Thank you all for your time. 谢谢大家的时间。

Just use lapply , which can apply a function to every element of a list, returning a list in the process: 只需使用lapply即可将函数应用于列表的每个元素,并在处理过程中返回列表:

numberOfProducts <- 2
result <- lapply(splitted, function(x) getCustomer(x, numberOfProducts))

Edit: 编辑:

It looks like your function has logic which sometimes can result in a data frame with no rows. 看来您的函数具有逻辑,有时可能会导致没有行的数据帧。 In this case, you may check for an empty data frame and return NA : 在这种情况下,您可以检查数据框是否为空并返回NA

output <- data.frame(Customer.ID, Gender,Segment, Net_Discount, Number_of_Discounts, Sales,
    Profit, Products, Products_Category, Products_Order_Date, lon, lat)
return(ifelse(nrow(output) > 0, output[1,], NA))

The problem was indeed that some data frames in splitted had no rows. 问题的确是,某些拆分的数据帧没有行。 So I removed them (only 3). 所以我删除了它们(只有3个)。

for (i in 1:length(splitted)){
l[i]<-nrow(splitted[[i]])  
}

indices<- which(l==0)

splitted<-splitted[-indices]

Just had to delete 3 samples. 只需删除3个样本。

Got no error this time. 这次没有任何错误。 Thank you all for your time. 谢谢大家的时间。

My usual strategy for troubleshooting something like this is to start running it in chunks. 我通常用于解决此类问题的策略是开始大块运行它。 If you use the for loop, check what value of i is when the error occurs. 如果使用for循环,请在发生错误时检查i值。 With lapply , I will run in chunks of around 20... and keep going until you find which data frame in your list is causing the error. 使用lapply ,我将以大约20的块运行...,并继续进行下去,直到找到列表中哪个数据帧导致了错误。

Then, run through your function manually with that data frame and look at what output you get. 然后,使用该数据框手动运行函数,并查看获得的输出。 For example: 例如:

df <- splitted[[30]] # assuming #30 is the problem
numberOfProducts <- 3

now walk through the function arguments and check that output until you find what causes the error. 现在遍历函数参数并检查该输出,直到找到导致错误的原因。 Keep in mind that if there are multiple places where problems can occur, it might take more than one application of this process to solve all the problems. 请记住,如果在多个地方可能发生问题,则可能需要一个以上的应用程序来解决所有问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM