请看我的问题，相信我很容易解决

Question

i tried to implement async and await inside spawn child process.我试图在 spawn 子进程中实现 async 和 await 。 But it didn't worked.但它没有奏效。 Please see this请看这个

Expected output预期 output

 *************
http://www.stevecostellolaw.com/
 *************
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html
 *************
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#
 *************
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html
 *************
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html
 *************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/

 *************

Becoz each time spawn child found await it will go back to python script and print ************* it and then print URL. Becoz 每次生成子时发现await它将 go 返回 python 脚本并打印*************它然后打印 URL。 Ignore 2 times printing of same url here.此处忽略相同 url 的 2 次打印。

Output which im getting我得到的 Output

C:\Users\ASUS\Desktop\searchermc>node app.js
server running on port 3000

DevTools listening on ws://127.0.0.1:52966/devtools/browser/933c20c7-e295-4d84-a4b8-eeb5888ecbbf
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.188] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.189] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)

 *************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/

 *************

Please see the app.js code below请参阅下面的app.js代码

// form submit request
app.post('/formsubmit', function(req, res){

    csvData = req.files.csvfile.data.toString('utf8');
    filteredArray = cleanArray(csvData.split(/\r?\n/))
    csvData = get_array_string(filteredArray)
    csvData = csvData.trim()
    
    var keywords = req.body.keywords
    keywords = keywords.trim()

    // Send request to python script
    var spawn = require('child_process').spawn;
    var process = spawn('python', ["./webextraction.py", csvData, keywords, req.body.full_search])

    var outarr = []

    // process.stdout.on('data', (data) => {
    //   console.log(`stdout: ${data}`);
    // });

    process.stdout.on('data', async function(data){

      console.log("\n ************* ")
      console.log(data.toString().trim())
      await outarr.push(data.toString().trim())
      console.log("\n ************* ")

    });

});

Python function which is sending in the URLs when the if condition matched Python function 在 if 条件匹配时发送 URL

# Function for searching keyword start
def search_keyword(href, search_key):
    extension_list = ['mp3', 'jpg', 'exe', 'jpeg', 'png', 'pdf', 'vcf']
    if(href.split('.')[-1] not in extension_list):
        try:    
            content = selenium_calling(href)
            soup = BeautifulSoup(content,'html.parser')
            search_string = re.sub("\s+"," ", soup.body.text)
            search_string = search_string.lower()
            res = [ele for ele in search_key if(ele.lower() in search_string)]
            outstr = getstring(res)
            outstr = outstr.lstrip(", ")
            if(len(res) > 0):
                print(href)
                found_results.append(href)
                href_key_dict[href] = outstr
                return 1
            else:
                notfound_results.append(href)
        except Exception as err:
            pass

I want to do all this because of the python script which takes more time to execute and thus give timeout error each time, so i am thinking to get intermediate ouput of the python script in my nodejs script.我想做这一切是因为 python 脚本需要更多时间来执行，因此每次都会出现超时错误，所以我想在我的 nodejs 脚本中获得 python 脚本的中间输出。 you can see the error im getting in below image.您可以在下图中看到错误。

Answer 1

I'm not sure I completely understand what you're trying to do, but I'll give it a shot since you seem to have asked this question many times already (which usually isn't a good idea).我不确定我是否完全理解你想要做什么，但我会试一试，因为你似乎已经多次问过这个问题（这通常不是一个好主意）。 I believe that there's a lack of clarity in your question - it would help a lot if you could clarify what your end goal is (ie how do you want this to behave?)我相信你的问题不够明确 - 如果你能澄清你的最终目标是什么（即你希望它如何表现？）

I think you mentioned two separate problems here.我想你在这里提到了两个不同的问题。 The first is that you expect a new line of '******' to be placed before each separate piece of data returned from your script.第一个是您希望在脚本返回的每条单独的数据之前放置一个新的“******”行。 This is something that can't be relied on - check out the answer to this question for more detail: Order of process.stdout.on( 'data', ... ) and process.stderr.on( 'data', ... ) .这是不能依赖的——查看这个问题的答案以获得更多细节： order of process.stdout.on( 'data', ... ) and process.stderr.on( 'data', . ..) . The data will be passed to your stdout handler in chunks, not line-by-line, and any amount of data can be provided at a time depending how much is currently in the pipe.数据将以块的形式传递给您的标准输出处理程序，而不是逐行传递，并且可以一次提供任意数量的数据，具体取决于 pipe 中当前的数据量。

The part I'm most confused about is your phrasing of "to get intermediate ouput of the python script in my nodejs script".我最困惑的部分是您的措辞“在我的 nodejs 脚本中获取 python 脚本的中间输出”。 There's not necessarily any "immediate" data - you can't rely on data coming in at any particular time with your process's stdout handler, its going to hand you data at a pace determined by the Python script itself and the process its running in. With that said, it sounds like your main problem here is the timeout happening on your POST.不一定有任何“即时”数据 - 您不能依赖进程的 stdout 处理程序在任何特定时间传入的数据，它会以 Python 脚本本身及其运行的进程确定的速度向您提供数据。话虽如此，听起来您的主要问题是您的 POST 发生超时。 You aren't ever ending your request - that's why you're getting a timeout.你永远不会结束你的请求——这就是你得到超时的原因。 I'm going to assume that you want to wait for the first chunk of data - regardless of how many lines it contains - before sending a response back.我将假设您想要等待第一块数据 - 无论它包含多少行 - 在发送回响应之前。 In that case, you'll need to add res.send, like this:在这种情况下，您需要添加 res.send，如下所示：

    // form submit request
app.post('/formsubmit', function(req, res){

    csvData = req.files.csvfile.data.toString('utf8');
    filteredArray = cleanArray(csvData.split(/\r?\n/))
    csvData = get_array_string(filteredArray)
    csvData = csvData.trim()
    
    var keywords = req.body.keywords
    keywords = keywords.trim()

    // Send request to python script
    var spawn = require('child_process').spawn;
    var process = spawn('python', ["./webextraction.py", csvData, keywords, req.body.full_search])

    var outarr = []

    // process.stdout.on('data', (data) => {
    //   console.log(`stdout: ${data}`);
    // });
    
    // Keep track of whether we've already ended the request
    let responseSent = false;

    process.stdout.on('data', async function(data){

        console.log("\n ************* ")
        console.log(data.toString().trim())
        outarr.push(data.toString().trim())
        console.log("\n ************* ")
        
        // If the request hasn't already been ended, send back the current output from the script
        // and end the request
        if (!responseSent) {
            responseSent = true;
            res.send(outarr);
        }
    });

});

请看我的问题，相信我很容易解决

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-02 07:07:20

请看我的问题，相信我很容易解决

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-02 07:07:20

解决方案1
2 已采纳 2021-04-02 07:07:20