checkForRemoteErrors(val) 错误：7 个节点产生错误；第一个错误：找不到 function“fread”

Question

All of the code included in this question is from the script called "LASSO code (Version for Antony)" in my GitHub Repo for this project.这个问题中包含的所有代码都来自我的这个项目的 GitHub Repo 中名为“LASSO code（Antony 的版本）”的脚本。 And you can run it on the file folder called "last 40" to verify my claim that it does run on limited sized datasets and if you really feel like going the extra mile, message me here and I'll share a 10k scale file folder full of datasets zipped of via OneDrive or Google Drive (whichever you prefer lad) with ya so you can also verify that the same script doesn't work in file folders of that volume.您可以在名为“last 40”的文件夹上运行它，以验证我关于它确实在有限大小的数据集上运行的说法，如果您真的想更进一步，请在此处给我发消息，我将分享一个 10k 比例的文件夹完整的数据集通过 OneDrive 或 Google Drive（无论您喜欢哪个）与 ya 一起压缩，因此您还可以验证相同的脚本在该卷的文件夹中不起作用。

This is absolutely going to drive me mad I swear, I have been using the lappy function below without issue for a week now, and starting several hours ago, it is giving me this error:这绝对会让我发疯，我发誓，我已经使用下面的 lappy function 一周了，没有问题，几个小时前开始，它给了我这个错误：

> datasets <- parLapply(CL, paths_list, function(i) {fread(i)})
Error in checkForRemoteErrors(val) : 
  7 nodes produced errors; first error: could not find function "fread"

Here is the rest of the script I am working with up until this line (after the lines I used to load all of the libraries I utilize):这是直到这一行我正在使用的脚本的 rest（在我用来加载我使用的所有库的行之后）：

# these 2 lines together create a simple character list of 
# all the file names in the file folder of datasets you created
folderpath <- "C:/Users/Spencer/Documents/EER Project/12th & 13th 10k"
paths_list <- list.files(path = folderpath, full.names = T, recursive = T)

# reformat the names of each of the csv file formatted datasets
DS_names_list <- basename(paths_list)
DS_names_list <- tools::file_path_sans_ext(DS_names_list)


# sort both of the list of file names so that they are in the proper order
my_order = DS_names_list |> 
  # split apart the numbers, convert them to numeric 
  strsplit(split = "-", fixed = TRUE) |>  unlist() |> as.numeric() |>
  # get them in a data frame
  matrix(nrow = length(DS_names_list), byrow = TRUE) |> as.data.frame() |>
  # get the appropriate ordering to sort the data frame
  do.call(order, args = _)

DS_names_list = DS_names_list[my_order]
paths_list = paths_list[my_order]

# this line reads all of the data in each of the csv files 
# using the name of each store in the list we just created
CL <- makeCluster(detectCores() - 2L)
clusterExport(CL, c('paths_list'))
library(data.table)
system.time( datasets <- parLapply(CL, paths_list, fread) )

After looking up the documentation for the 3rd time today, I am thinking of trying:今天第三次查阅文档后，我正在考虑尝试：

system.time( datasets <- parLapply(CL, paths_list, fun = fread) )

Will that work??那行得通吗？？

ps Here is all of the libraries I load as the first thing I do: ps 这是我加载的所有库，这是我做的第一件事：

# load all necessary packages
library(plyr)
library(dplyr)
library(tidyverse)
library(readr)
library(stringi)
library(purrr)
library(stats)
library(leaps)
library(lars)
library(elasticnet)
library(data.table)
library(parallel)

Also, I have already tried the following and none worked:另外，我已经尝试了以下方法，但都没有用：

datasets <- parLapply(CL, paths_list, function(i) {fread(i)})
datasets <- parLapply(CL, paths_list, function(i) {fread[i]})
datasets <- parLapply(CL, paths_list, function(i) {fread[[i]]})

datasets <- parLapply(CL, paths_list, \(ds) 
                      {fread(ds)})

system.time( datasets <- lapply(paths_list, fread) )

And when I run that last one, datasets <- lapply(paths_list, fread), I get the same error, this was exactly the original successful version I ran at the beginning of last week and I only chose to use the parallel version because the datasets folder I am importing/loading has 260,000 csv file-formatted datasets in it.当我运行最后一个 datasets <- lapply(paths_list, fread) 时，我得到了同样的错误，这正是我在上周初运行的原始成功版本，我只选择使用并行版本，因为我正在导入/加载的数据集文件夹中有 260,000 个 csv 文件格式的数据集。 So, this means two version which have worked dozens of times already just stopped working suddenly today!所以，这意味着已经运行了数十次的两个版本今天突然停止运行了！

Answer 1

See if this works consistently.看看这是否始终有效。 It hasn't failed yet on my Windows desktop with 20k files (I copied & pasted your 40 files a bunch).它在我的 Windows 桌面上还没有失败，有 20k 个文件（我复制并粘贴了你的 40 个文件）。 It's run 5 times and I've restarted the R session and RStudio each time.它运行了 5 次，我每次都重新启动 R session 和 RStudio。

It's too bad that the problem arises non-deterministically, but that's part of the parallel-computation game.问题不确定地出现太糟糕了，但这是并行计算游戏的一部分。 See if this stripped-down example run consistently?看看这个精简示例是否始终如一地运行？

Notice I'm avoiding library() to eliminate naming collisions caused by packages with identically-named functions.请注意，我正在避免使用library()来消除由具有相同名称函数的包引起的命名冲突。 Also, I closed the cluster connection at the end.另外，我最后关闭了集群连接。

# Enumerate files
paths_list <- 
  "~/Documents/delete-me/EER-Research-Project-main/20k" |> 
  list.files(full.names = T, recursive = T)

# Establish cluster
CL <- parallel::makeCluster(parallel::detectCores() - 2L)
parallel::clusterExport(CL, c('paths_list'))

# Read files
system.time({
  datasets <- parallel::parLapply(CL, paths_list, data.table::fread)
})

# Stop cluster
parallel::stopCluster(CL)

#>    user  system elapsed 
#>    7.09    1.22  101.93

checkForRemoteErrors(val) 错误：7 个节点产生错误；第一个错误：找不到 function“fread”

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-08 19:16:39

checkForRemoteErrors(val) 错误：7 个节点产生错误； 第一个错误：找不到 function“fread”

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-08 19:16:39

checkForRemoteErrors(val) 错误：7 个节点产生错误；第一个错误：找不到 function“fread”

解决方案1
1 已采纳 2023-01-08 19:16:39