简体   繁体   English

cheerio 有时会返回空字符串

[英]cheerio sometimes returns empty string

I'm scraping Genius.com for lyrics;我正在抓取 Genius.com 的歌词; I've googled and can't seem to find a reason for why my code isn't working.我用谷歌搜索,似乎找不到我的代码无法正常工作的原因。 I am scraping the text from the div on a Genius.org page (ie, https://genius.com/Britney-spears-baby-one-more-time-lyrics ).我正在从 Genius.org 页面上的 div 中抓取文本(即https://genius.com/Britney-spears-baby-one-more-time-lyrics )。

Viewing the page source, it appears the div exists and is populated with text in the source and not by Javascript or otherwise (if it was, wouldn't cheerio work zero percent of the time in this context?) When I run my code, it works 50% of the time;查看页面源代码,似乎 div 存在并且在源代码中填充了文本,而不是由 Javascript 或其他方式填充(如果是的话,在这种情况下,cheerio 不会在百分之零的时间内工作吗?)当我运行我的代码时,它有 50% 的时间有效; other times it returns an empty.其他时候它返回一个空的。

I saw this but this seems like a hack-ey solution and I don't really see why my async/await isn't working for the full response from phin...我看到了这个,但这似乎是一个 hack-ey 解决方案,我真的不明白为什么我的 async/await 不能为 phin 的完整响应工作......

Here's the code in question这是有问题的代码

const scraperRouter = require('express').Router()
const p = require('phin')
const cheerio = require('cheerio')

scraperRouter.get('/', async (req, res) => {
    
        const url = req.header('geniusUrl')
    
        const _res = await p(url)
        
        try {
            let $ = cheerio.load(_res.body)
            const lyrics = $('.lyrics').text()
    
            res.send(lyrics)
        }
        catch (e) {
            console.log(e)
            res.json(e)
        }
    })

Any advice appreciated.任何建议表示赞赏。 Thanks.谢谢。

Converting my comment to an answer after OP confirmed it as the solution:在 OP 确认为解决方案后将我的评论转换为答案:

Sometimes this happens when sites are A/B testing.有时,当网站进行 A/B 测试时会发生这种情况。 They might redirect you to one of a couple DOMs.他们可能会将您重定向到几个 DOM 之一。 There might also be regional differences.也可能存在地区差异。 I recommend trying to access it from a couple different IPs, browsers, regions, etc to try to figure out if there's a pattern.我建议尝试从几个不同的 IP、浏览器、区域等访问它,以尝试找出是否存在模式。 If you can narrow it down to a couple of different DOMs, then you can conditionally try both.如果您可以将范围缩小到几个不同的 DOM,那么您可以有条件地尝试两者。

This can also occur due to rate limiting.这也可能由于速率限制而发生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM