I am trying to practice web scraping using a betting site for UFC fights. I am using javascript and the packages request-promise and cheerio.
Site: https://www.oddsshark.com/ufc/odds
I want to scrape the name of the fighters and their respective betting lines for each betting company.
My goal is to end up with something like an array of objects that I can later seed a postgresql database with.
Example of my desired output (doesn't have to be exactly like that but similar):
[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBetLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]
Below is the code I have so far. I am a noob at this:
const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";
// cheerio to parse HTML
const $ = require("cheerio");
rp(url)
.then(function(html) {
// it worked :)
// console.log("MMA page:", html);
// console.log($("big > a", html).length);
// console.log($("big > a", html));
console.log($(".op-matchup-team-text", html).length);
console.log($(".op-matchup-team-text", html));
})
// why isn't catch working?
.catch(function(error) {
// handle error
});
My code above returns indexes as keys with nested objects as values. Below is just one of them as an example.
{ '0':
{ type: 'tag',
name: 'span',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object: null prototype] { class: 'op-matchup-team-text' },
'x-attribsNamespace': [Object: null prototype] { class: undefined },
'x-attribsPrefix': [Object: null prototype] { class: undefined },
children: [ [Object] ],
parent:
{ type: 'tag',
name: 'div',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object],
'x-attribsNamespace': [Object],
'x-attribsPrefix': [Object],
children: [Array],
parent: [Object],
prev: [Object],
next: [Object] },
prev: null,
next: null },
I don't know what to do from here. Am I calling the right class (op-matchup-team-text)? If so, how do I extract the fighter names and betting line tag elements from the website?
////////////////////////////////////////////////////////////////////////// UPDATE 1 ON ORIGINAL POST //////////////////////////
Updated: Using Henk's suggestion, I'm able to scrape fighter name. Using the code template for fighter name, I was able to scrape fighter betting lines as well.
BUT I don't know how to get both on one object. For example, how do I associate the betting line with the fighter him/herself?
Below is my code for scraping the OPENING company's betting line:
rp(url)
.then(function(html) {
const $ = cheerio.load(html);
const openingBettingLine = [];
// parent class of fighter name
$("div.op-item.op-spread.op-opening").each((index, currentDiv) => {
const openingBet = {
opening: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
openingBettingLine.push(openingBet);
});
console.log("openingBettingLine array test 2:", openingBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});
It console logs out the following:
openingBettingLine array test 2: [ { opening: '-200' },
{ opening: '+170' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '+105' },
{ opening: '-135' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-120' },
{ opening: '-110' },
{ opening: '-135' },
{ opening: '+105' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-115' },
{ opening: '-115' },
{ opening: '-145' },
{ opening: '+115' },
{ opening: '+208' },
{ opening: '-263' },
etc.
My desired object output is still (as example below). So how would I get the openingBettingLine into the object associated with the fighter?
[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBettingLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]
////////////////////////////////////////////////////////////////////////// UPDATE 2 ON ORIGINAL POST //////////////////////////
I can't get the BOVADA company's betting line to scrape. I isolated the code to just this company below.
// BOVADA betting line array --> not working
rp(url)
.then(function(html) {
const $ = cheerio.load(html);
const bovadaBettingLine = [];
// parent class of fighter name
$("div.op-item.op-spread.border-bottom.op-bovada.lv").each(
(index, currentDiv) => {
const bovadaBet = {
BOVADA: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
bovadaBettingLine.push(bovadaBet);
}
);
console.log("bovadaBettingLine:", bovadaBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});
It returns: bovadaBettingLine: []
with nothing in it.
Below is the HTML code for that part of the website.
Short:
In Detail:
first analyse the source code of your desired data:
<div class="op-matchup-team op-matchup-text op-team-top" data-op-name="{full_name:Jessica Andrade,short_name:}"><span class="op-matchup-team-text">Jessica Andrade</span></div>
You are trying to get the name of the fighter. So you could aim for the content of the <span class="op-matchup-team-text">Jessica Andrade</span>
or the attribute of the parents div
which is data-op-name="{full_name:Jessica Andrade,short_name:}"
Let's try the second one:
divs
with the desired content: $("div.op-matchup-team.op-matchup-text.op-team-top")
each()
iteratorfighters
array.see also the code comments below:
const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";
const cheerio = require("cheerio")
rp(url)
.then(function (html) {
const $ = cheerio.load(html)
const fighters = [];
$("div.op-matchup-team.op-matchup-text.op-team-top")
.each((index, currentDiv) => {
const fighter = {
name: JSON.parse(currentDiv.attribs["data-op-name"]).full_name,
//There is no direct selector for the rows of the second column based on the first one.
//So you need to select all rows of the second column as you did, and then use the current index
//to get the right row. Put the selected data into your "basket" the fighter object. Done.
openingBetLine: JSON.parse($("div.op-item.op-spread.op-opening")[index].attribs["data-op-moneyline"]).fullgame
// go on the same way with the other rows that you need.
}
fighters.push(fighter)
})
console.log(fighters)
}).catch(function (error) {
//error catch does work, you just need to print it out to see it
console.log(error)
});
will give you:
[{ name: 'Jessica Andrade',
openingBetLine: '-200'},...]
You have to call get() to turn the cheerio object into an array:
let teamData = $('.op-matchup-wrapper').map((i, div) => ({
time: $(div).find('.op-matchup-time').text(),
teams: $(div).find('.op-matchup-team-text').map((i, t) => $(t).text()).get()
})).get()
Those betting lines are outside of the teams area so you would need to get them separately and merge them somehow.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.