简体   繁体   English

Node.js puppeteer-转换获取的href字符串

[英]Node.js puppeteer - Transforming fetched href string

I'm using node.js and puppeteer to get some data. 我正在使用node.js和puppeteer来获取一些数据。 ... now I want to transform one of my outputs. ...现在我要转换我的输出之一。 Instead of getting a href like this: 而不是得到这样的href:

Console: 安慰:

myURL/data/1344888/156999-18-1605-index.html    

The desired output should have this structure: 所需的输出应具有以下结构:

myURL/data/1344888/156999181605/156999-18-1605.txt

As you can see ... the first part is identical: 如您所见...第一部分是相同的:

myURL/data/1344888/

... the middle part should have no hyphen and is the first part of the last part: ...中间部分应该没有连字符,并且是最后一部分的第一部分:

                  /156999181605/

... and in the last part ... the -index.html should be replaced by .txt ...以及最后一部分...- index.html应替换为.txt

                               /156999-18-1605.txt

That's how I fetch the original href: 这就是我获取原始href的方式:

const puppeteer = require('puppeteer');
const fs = require('fs-extra');

(async function main() {
  try {

    const browser = await puppeteer.launch({ headless: false })
    const page = await browser.newPage();

    await page.goto('myURL', {waitUntil: 'load'});

    const table = await page.waitForSelector('#formDiv > div > table');

    const link = await page.$('#formDiv > div > table > tbody > tr:nth-child(5) > td:nth-child(3) > a');
    const linkHref = await page.evaluate( link => link.href, link );

    console.log(linkHref);      

    ...


  } catch (e) {
    console.log('our error', e);
  }

})();

How could this be done? 怎么办呢?

Console: 安慰:

myURL/data/1344888/156999-18-1605-index.html    

Desired output should: 所需的输出应:

myURL/data/1344888/156999181605/156999-18-1605.txt

You can use the following solution to convert your original URL into the format you desire: 您可以使用以下解决方案将原始URL转换为所需的格式:

const original_url = 'myURL/data/1344888/156999-18-1605-index.html';
const modified_url = original_url.replace( /(\d+-\d+-\d+-index.html)/, match => match.replace( /\D/g, '' ) + '/' + match.replace( '-index.html', '.txt' ) );

console.log( original_url ); // myURL/data/1344888/156999-18-1605-index.html
console.log( modified_url ); // myURL/data/1344888/156999181605/156999-18-1605.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM