Webscraping page_soup.findAll i need to extract especific data from a webpage but dont know how to do it

Question

i am trying to do some webscraping and i need to extract the keywords from a webpage. I am trying to use page_soup.findAll() to extract it but i dont know what to insert between () to extract what i need.

The code of the page is the following:

var kv = {"seccion": "otros","nivel": "home","nota": "","id_nota": "","tipo": "noticias","keywords" : "IMPUESTOS,  SII,  EXCEDENTES ISAPRES,  INCENDIOS,  COLUSION CONFORT,  COMPENSACION,  PERMISOS DE CIRCULACION,  REVISION TECNICA"};

And i need these data:

"IMPUESTOS, SII, EXCEDENTES ISAPRES, INCENDIOS, COLUSION CONFORT, COMPENSACION, PERMISOS DE CIRCULACION, REVISION TECNICA"

Thanks

Answer 1

This is not HTML but JavaScript so findaAll() is useless for this.

You have it as string so use string functions to get it - ie. slicing [start:end] , split() , replace() , etc.

OR you can remove from this string var kv = and ; and you will have JSON string which you can convert to Python's dictionary using module json and then you can get it from dictionary - dictionary["keywords"]

text = 'var kv = {"seccion": "otros","nivel": "home","nota": "","id_nota": "","tipo": "noticias","keywords" : "IMPUESTOS,  SII,  EXCEDENTES ISAPRES,  INCENDIOS,  COLUSION CONFORT,  COMPENSACION,  PERMISOS DE CIRCULACION,  REVISION TECNICA"};'

text = text[9:-1]  # remove `var kv = ` and `;`

import json

d = json.loads(text)

print(d['keywords'])

Webscraping page_soup.findAll i need to extract especific data from a webpage but dont know how to do it

Question

1 answers

solution1
0 2019-11-26 01:16:00

Webscraping page_soup.findAll i need to extract especific data from a webpage but dont know how to do it

Question

1 answers

solution1 0 2019-11-26 01:16:00

solution1
0 2019-11-26 01:16:00