简体   繁体   English

如何管理多个“可执行文件和数据目录配置文件”以并行启动抓取工具?

[英]how to manage multiple 'executable and datadir profile' for parallelizing the launch of scrappers?

I am using, and have difficulties to launch 4 scripts at the same time.我正在使用,并且很难同时启动 4 个脚本。 I have used theses variable for local browser我已将这些变量用于本地浏览器

let CHROMIUM_DATA_DIR = `/Users/yo/dataDir/datadir${this.cmd}`
let CHROMIUM_EXEC_PATH = `/Applications/Google-Chrome${this.cmd}.app/Contents/MacOS/Google Chrome`

I have multiplied by 4, the same datadir, et the same executable.我乘以 4,相同的数据目录,以及相同的可执行文件。 I have just renamed the files/directories.我刚刚重命名了文件/目录。

It does not work well.它不好用。 What would be your recomendation, to quickly scale the launch of the scrappers ().您的建议是什么,以快速扩展刮板 () 的推出。 How could I install various chromes instance, et managing according datadir (to save some login session etc..)我如何安装各种 chrome 实例,以及根据 datadir 管理(以保存一些登录 session 等..)

tks tks

在此处输入图像描述

Since you are using playwright, you can use persistent contexts .由于您使用的是剧作家,因此您可以使用持久性上下文

You do not need to create your own data directories or executables by copying them, simply pass location of an empty directory when launching the browser and playwright will populate it itself, storing any session data.您不需要通过复制来创建自己的数据目录或可执行文件,只需在启动浏览器时传递一个空目录的位置,playwright 就会自行填充它,存储任何 session 数据。

I do not use node.js, but just to give an idea, sample code in python:我不使用 node.js,只是为了给出一个想法,python 中的示例代码:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch_persistent_context(user_data_dir=r'C:\Users\me\Desktop\dir', headless=False)

    page = browser.new_page()
    page.goto("http://playwright.dev")
    print(page.title())
    browser.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM