I'm learning unit tests, and I've covered pretty much the basics.
Now I'm trying to test my scraper. My initial function involves multiple scrapers, but i wanted to check if each scraper works properly. The output of the test (to test be successful) should be if the return has an [{},{},...,{}]
- array of objects. I really cant know what is inside of the objects(data is always different), but it's always a array of objects.
Now what my question is: How to properly do this? I've tried in multiple ways, and my test always fails. Closes that I've been is this solution:
const santScraper = require('../scraping/scrapers/sant-scraper');
test('Does scraper works properly', async () => {
await expect(santScraper(1)).toBe({});
});
But my output is this:
FAIL tests/santScraper.test.js
× Does scraper works properly (20 ms)
● Does scraper works properly
expect(received).toBe(expected) // Object.is equality
- Expected - 1
+ Received + 1
- Object {}
+ Promise {}
2 |
3 | test('Does scraper works properly', async () => {
> 4 | await expect(santScraper(1)).toBe({});
| ^
5 | });
6 |
at Object.<anonymous> (tests/santScraper.test.js:4:32)
Test Suites: 1 failed, 1 total
Tests: 1 failed, 1 total
Snapshots: 0 total
Time: 2.016 s
Ran all test suites.
npm ERR! Test failed. See above for more details.
I really dont know what is pointing to.
Also here is my scraper:
//const data_functions = require('../data-functions/data-functions');
const axios = require('axios'); //npm package - promise based http client
const cheerio = require('cheerio'); //npm package - used for web-scraping in server-side implementations
const data_functions = require('../data-functions/data-functions');
//santScaper function which as paramater needs count which is sent in the scraping-service file.
const santScraper = async (count) => {
const url = `https://www.sant.ba/pretraga/prodaja-1/tip-2/cijena_min-20000/stranica-${count}`;
const santScrapedData = [];
try {
await load_url(url, santScrapedData);
} catch (error) {
console.log(error);
}
};
//Function that does loading URL part of the scraper, and starting of process for fetching raw data.
const load_url = async (url, santScrapedData) => {
await axios.get(url).then((response) => {
const $ = cheerio.load(response.data);
get_article_html_nodes($).each((index, element) => {
process_single_article($, index, element, santScrapedData);
});
data_functions.mergeData(santScrapedData);
return santScrapedData; //this is where I return array of objects
});
};
// Part where raw html data is fetched but in div that we want.
const get_article_html_nodes = ($) => {
return $('div[class="col-xxs-12 col-xss-6 col-xs-6 col-sm-6 col-lg-4"]');
};
//Here is all logic for getting data that we want, from the raw html.
const process_single_article = ($, index, element, santScrapedData) => {
const getLink = $(element).find('a[class="re-image"]').attr('href');
const getDescription = $(element).find('a[class="title"]').text();
const getPrice = $(element)
.find('div[class="prices"] > h3[class="price"]')
.text()
.replace(/\.| ?KM$/g, '')
.replace(',', '.');
const getPicture = $(element).find('img').attr('data-original');
const getSquaremeters = $(element)
.find('span[class="infoCount"]')
.first()
.text()
.replace(',', '.')
.split('m')[0];
const pricepersquaremeter =
parseFloat(getPrice) / parseFloat(getSquaremeters);
santScrapedData[index] = {
id: getLink.substring(42, 46),
link: getLink,
description: getDescription,
price: Math.round(getPrice),
picture: getPicture,
squaremeters: Math.round(getSquaremeters),
pricepersquaremeter: Math.round(pricepersquaremeter),
};
};
module.exports = santScraper;
You can make use of typeof
in this case like so:-
test('Does scraper works properly', async () => {
await expect(typeof santScraper(1)).toBe('object');
});
Although if it's always the Promise
object you want to check against, you can get more specific using instanceof
like so:-
test('Does scraper works properly', async () => {
await expect(santScraper(1) instanceof Promise).toBe(true);
});
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.