I need to find all elements as <td alert="0" op="0" class=" es_numero cell_imps24ad"><span>1.204</span></td>
in my html code. I can't send all the html code because it is confidential information.
I'm trying with this code:
# encoding=utf8
# -*- coding: utf-8 -*-
import random
import requests
from requests.auth import HTTPBasicAuth
import sys
import csv
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf-8')
lista = []
number = str(random.random())
user = ''
passwd = ''
url = ''
login = requests.get(url, auth=HTTPBasicAuth(user, passwd))
url_sitios = ''
sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
sitios2 = sitios.text
html = sitios2
soup = BeautifulSoup(html)
for item in soup.find_all("td", {"class": " es_numero cell_imps24ad"}):
print item.text, item.next_sibling
And the output I want is something like this: es_numero cell_imps24ad : 1.204
You need to pass the parse type to BeautifulSoup
:
soup = BeautifulSoup(html, 'lxml') #add the 'lxml' parser
for item in soup.find_all("td", {"class": " es_numero cell_imps24ad"}):
print item.text, item.next_sibling
Edit: given the nombre
and url
tagged html, you can try this:
from bs4 import BeautifulSoup as soup
import re
s = "<url>https://www.google.com.ar/</url>\n<nombre>google.com.ar</nombre>"
data = map(lambda x:x.text, soup(s, 'lxml').find_all(re.compile('nombre|url')))
Output:
[u'https://www.google.com.ar/', u'google.com.ar\u200c\u200b']
Edit 2: for smaller extractions:
from bs4 import BeautifulSoup as soup
s = '<ultimas24hrs> <item id="imps24ad">0</item>'
new_s = soup(s, 'lxml')
the_id = int(new_s.find('item', {'id':"imps24ad"}).text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.