简体   繁体   English

OSError: [Errno 107] 在 Jupyter docker 容器中处理大文件时

[英]OSError: [Errno 107] in Jupyter docker container when processing large files

I am working with Jupyter notebook (.ipynb) files in a docker container.我正在使用 docker 容器中的 Jupyter 笔记本 (.ipynb) 文件。 I have mounted a backup file containing ~16 million entries of SQL, which I want to operate on.我已经安装了一个备份文件,其中包含我想要操作的 SQL 的约 1600 万个条目。 The entries in the backup file are of PostgreSQL dialect, and are raw SQL somewhat as follows:备份文件中的条目是 PostgreSQL 方言,并且是原始的 SQL 有点如下:

INSERT INTO table_name([column_names separated by comma]) VALUES (values separated by comma);

I am having problems dealing with this file in Jupyter.我在 Jupyter 中处理此文件时遇到问题。 I tried to read it line by line and insert the lines as raw SQL into PostgreSQL database.我尝试逐行阅读并将这些行作为原始 SQL 插入 PostgreSQL 数据库。 This worked for a couple dozen thousand rows but then the OSError: [Errno 107] Transport endpoint is not connected appeared.这适用于几十万行,但随后出现OSError: [Errno 107] Transport endpoint is not connected Then I tried to start writing from the line where I last left off but the error appears soon enough again.然后我尝试从我上次停下的那一行开始写,但错误很快又出现了。

Next, I tried Python's readlines:接下来,我尝试了 Python 的 readlines:

with open("/path/to/file.backup", "r") as f:
    content = f.readlines()

but that fails as well, with the same error.但这也失败了,同样的错误。

It might be of importance that the backup file is mounted to the docker container as a volume.将备份文件作为卷安装到 docker 容器可能很重要。 The container is run using docker-compose and the relevant parts of the docker-compose.yml are here:容器使用docker-compose运行,docker-compose.yml 的相关部分在这里:

version: '2.4'
services:
  volumes:
    - /c/Users/myuser/path/to/notebooks:/tf/notebooks
    - /c/Users/myuser/path/to/backup_file:/tf/data
  ports:
    - 8888:8888
  user: root

The.ipynb file I am working with resides on the same level but in a different directory than the backup file.我正在使用的 .ipynb 文件与备份文件位于同一级别,但位于不同的目录中。 Both directories are mounted from outside the container.这两个目录都是从容器外部安装的。

I am wondering whether this could be like a memory or CPU issue (I did docker stats on my container, lot of CPU usage during handling of the file), but it is still weird to me that it gives this error.我想知道这是否可能像 memory 或 CPU 问题(我在容器上做了docker stats ,在处理文件期间 CPU 使用率很高),但它给出了这个错误对我来说仍然很奇怪。

Whenever this error occurs, I am unable to access my notebooks directory in Jupyter, and I have to restart entire Docker (taking the container down and up is not enough).每当发生此错误时,我都无法访问 Jupyter 中的 notebooks 目录,我必须重新启动整个 Docker(将容器上下移动是不够的)。

Relevant logs from Jupyter container: [E 11:48:39.552 LabApp] Error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/filemanager.py", line 471, in save self._save_notebook(os_path, nb) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 293, in _save_notebook with self.atomic_writing(os_path, encoding='utf-8') as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 213, in atomic_writing with atomic_writing(os_path, *args, log=self.log, **kwargs) as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 103, in atomic_writing copy2_safe(path, tmp_path, log=log) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 51, in copy2_safe shutil.copyfile(src, dst) File "/usr/lib/python3.6/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: OSError: [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db.ipynb?1591098519530 (172.18.0.1): Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [E 11:48:39.556 LabApp] { "Host": "localhost:8888", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "http://localhost:8888/lab?", "Content-Type": "application/json", "X-Xsrftoken": "2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618", "Origin": "http://localhost:8888", "Content-Length": "6836", "Connection": "keep-alive", "Cookie": "username-localhost-8888=\"2|1:0|10:1590754680|23:username-localhost-8888|44:MDI2OWM5NjVlMjhmNGJjOTgzZjZkNDg3ZDMyNmMyMDc=|f2b616ab71fecbbeac60cfa57455e7d49cdf3563ef00729e1adc3f0c4d17f86e\"; pga4_session=34c2a786-ecfe-48b3-8682-c56b64236b64;3p1/6xx3V2lJFdQouLkwpXmndM8=; _xsrf=2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618, PGADMIN_LANGUAGE=en": "Pragma", "no-cache": "Cache-Control": "no-cache" } [E 11:48.39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db?ipynb.1591098519530 (172.18.0.1) 20:92ms referer=http://localhost?8888/lab?来自 Jupyter 容器的相关日志: [E 11:48:39.552 LabApp] Error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/filemanager.py", line 471, in save self._save_notebook(os_path, nb) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 293, in _save_notebook with self.atomic_writing(os_path, encoding='utf-8') as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 213, in atomic_writing with atomic_writing(os_path, *args, log=self.log, **kwargs) as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 103, in atomic_writing copy2_safe(path, tmp_path, log=log) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 51, in copy2_safe shutil.copyfile(src, dst) File "/usr/lib/python3.6/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: OSError: [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db.ipynb?1591098519530 (172.18.0.1): Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [E 11:48:39.556 LabApp] { "Host": "localhost:8888", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "http://localhost:8888/lab?", "Content-Type": "application/json", "X-Xsrftoken": "2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618", "Origin": "http://localhost:8888", "Content-Length": "6836", "Connection": "keep-alive", "Cookie": "username-localhost-8888=\"2|1:0|10:1590754680|23:username-localhost-8888|44:MDI2OWM5NjVlMjhmNGJjOTgzZjZkNDg3ZDMyNmMyMDc=|f2b616ab71fecbbeac60cfa57455e7d49cdf3563ef00729e1adc3f0c4d17f86e\"; pga4_session=34c2a786-ecfe-48b3-8682-c56b64236b64;3p1/6xx3V2lJFdQouLkwpXmndM8=; _xsrf=2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618, PGADMIN_LANGUAGE=en": "Pragma", "no-cache": "Cache-Control": "no-cache" } [E 11:48.39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db?ipynb.1591098519530 (172.18.0.1) 20:92ms referer=http://localhost?8888/lab? [E 11:48:39.552 LabApp] Error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/filemanager.py", line 471, in save self._save_notebook(os_path, nb) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 293, in _save_notebook with self.atomic_writing(os_path, encoding='utf-8') as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 213, in atomic_writing with atomic_writing(os_path, *args, log=self.log, **kwargs) as f: File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 103, in atomic_writing copy2_safe(path, tmp_path, log=log) File "/usr/local/lib/python3.6/dist-packages/notebook/services/contents/fileio.py", line 51, in copy2_safe shutil.copyfile(src, dst) File "/usr/lib/python3.6/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: OSError: [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db.ipynb?1591098519530 (172.18.0.1): Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [W 11:48:39.556 LabApp] Unexpected error while saving file: notebooks/populate_db.ipynb [Errno 107] Transport endpoint is not connected: '/tf/notebooks/populate_db.ipynb' [E 11:48:39.556 LabApp] { "Host": "localhost:8888", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "http://localhost:8888/lab?", "Content-Type": "application/json", "X-Xsrftoken": "2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618", "Origin": "http://localhost:8888", "Content-Length": "6836", "Connection": "keep-alive", "Cookie": "username-localhost-8888=\"2|1:0|10:1590754680|23:username-localhost-8888|44:MDI2OWM5NjVlMjhmNGJjOTgzZjZkNDg3ZDMyNmMyMDc=|f2b616ab71fecbbeac60cfa57455e7d49cdf3563ef00729e1adc3f0c4d17f86e\"; pga4_session=34c2a786-ecfe-48b3-8682-c56b64236b64;3p1/6xx3V2lJFdQouLkwpXmndM8=; _xsrf=2|695ba9e2|ce49a1a69d34d68ccd56352b0c805223|1591003618, PGADMIN_LANGUAGE=en": "Pragma", "no-cache": "Cache-Control": "no-cache" } [E 11:48.39.556 LabApp] 500 PUT /api/contents/notebooks/populate_db?ipynb.1591098519530 (172.18.0.1) 20:92ms referer=http://localhost?8888/lab?

I have not experienced this before, and seem unable to solve.我以前没有经历过,似乎无法解决。 Thanks to anyone who can help!感谢任何能提供帮助的人!

Since no one took this up, I decided to post my own non-conclusive answer.由于没有人接受这个,我决定发布我自己的非结论性答案。

The problem (socket error) seems to occur when too much I/O operations occur over a websocket.当在 websocket 上发生太多 I/O 操作时,似乎会出现问题(套接字错误)。 Since a docker container communicates with docker volumes via websockets, there is a limit to the number of operations per unit of time that can take place.由于 docker 容器通过 websockets 与 docker 卷进行通信,因此每单位时间可以发生的操作数量存在限制。

In my case there was a lot of reading from a volume and simultaneously a lot of writing to a volume.在我的情况下,从一卷中读取大量内容,同时对一卷进行大量写入。

Possible solutions include:可能的解决方案包括:

  1. limiting operation amount per unit of time限制单位时间的操作量
  2. building the volumes into the docker image that is being run, if that is possible将卷构建到正在运行的 docker 映像中(如果可能的话)

Option 1 could be too slow for some applications, and option 2 might not always be possible due to the changing contents of the volumes.对于某些应用程序,选项 1 可能太慢,并且由于卷的内容不断变化,选项 2 可能并不总是可行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 OSError: [Errno 107] 传输端点未连接 - OSError: [Errno 107] Transport endpoint is not connected OSError:[Errno None not found] 2 - 在Windows上启动jupyter时出错 - OSError: [Errno None not found] 2 - error when launching jupyter on Windows python socket OSError:[Errno 107]传输端点未连接 - python socket OSError: [Errno 107] Transport endpoint is not connected OSError: [Errno 107] 传输端点未连接(使用 python 套接字) - OSError: [Errno 107] Transport endpoint is not connected(using python sockets) python3 OSError: [Errno 107] 传输端点未连接 - python3 OSError: [Errno 107] Transport endpoint is not connected Gunicorn 因 OSError 失败:[Errno 107] 传输端点未连接 - Gunicorn is failing with OSError: [Errno 107] Transport endpoint is not connected Azure function 在成功部署后失败并出现 OSError:[Errno 107] - Azure function failing after successfull deployment with OSError: [Errno 107] OSError: [Errno 13] Permission denied when initializing Celery in Docker - OSError: [Errno 13] Permission denied when initializing Celery in Docker OSError:[Errno'jupyter-notebook'not found] 2 - OSError: [Errno 'jupyter-notebook' not found] 2 Jupyter笔记本-OSError [Errno 2]没有此类文件或目录 - Jupyter notebook - OSError [Errno 2] No such file or directory
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM