[英]How can I extract a specific element from HTML code using python
I'm not so confident with HTML language and I'm having trouble in parsing this portion of HTML code (result of print soup.prettify() ) with Python. 我对HTML语言不是很自信,并且在使用Python解析HTML代码的这一部分(print soup.prettify()的结果)时遇到了麻烦。
$("#global-flash").html(""); $('#reviews-tab-navigation').trigger('repaint'); $('#edit-review-tab').html(' <div class='\\"row-fluid\\"'> \\n <div class='\\"span3\\"'> \\n <div class='\\"label' full-height="" id='\\"review-search-result-panel\\"' use-bootstrap-tables\\"=""> \\n <span class='\\"panel-headline\\"'> Rezensionsdaten<\\/span>\\n <hr/> \\n\\n <table class='\\"table' id='\\"review-search-result-list\\"' table-hover="" table-striped\\"=""> \\n <thead> \\n <tr> \\n <th> \\n <span class='\\"review-count\\"'> 5<\\/span>\\n\\n Rezensionen gefunden\\n <\\/th>\\n <\\/tr>\\n <\\/thead>\\n\\n <tbody> \\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10613555\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span aderisce="" anzi="" bene="" colore="" come="" difettucci="" e="" foto,="" i="" in="" morbidissima,="" non="" pelle,="" piacevole="" rotolini.\\"="" segnare="" senza="" stringe="" sulla="" title='\\"Bel'> Bel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci ei rotolini.<\\/span>\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10610141\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span title='\\"bella\\"'> bella<\\/span>\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10575319\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 4\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span buona="" morbido.\\"="" qualità-prezzo,="" rapporto="" tessuto="" title='\\"Buon' vestibilità,=""> Buon rapporto qualità-prezzo, buona vestibilità, tessuto morbido.<\\/span>\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10554514\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span buon="" capo!="" giusto="" ottima="" peso\\"="" qualità,="" title='\\"Davvero' un=""> Davvero un buon capo! Ottima qualità, giusto peso<\\/span>\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"9469234\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span ....="" altri="" anche="" bello="" colori="" e="" funzionale.="" in="" regolare.\\"="" taglia="" title='\\"Preso'> Preso anche in altri colori .... bello e funzionale. Taglia regolare.<\\/span>\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <\\/tbody>\\n <\\/table>\\n <\\/div>\\n <\\/div>\\n\\n <div class='\\"span9\\"'> \\n <div class='\\"row-fluid\\"'> \\n <div class='\\"span3\\"'> \\n <div class='\\"label' full-height\\"="" id='\\"product-data-panel\\"'> \\n <span class='\\"panel-headline\\"'> Informazioni articolo<\\/span>\\n <hr/> \\n <a href='\\"https://www.bonprix.it/search.htm?qu=95341195\\"' target='\\"_blank\\"'> <img src="\\'http://image01.bonprix.de/bonprixbilder//assets/114x160/13050022.jpg\\'"/> <\\/a>\\n <label> N. art.<\\/label>\\n <a class='\\"btn-link\\"' href='\\"https://www.bonprix.it/search.htm?qu=95341195\\"' target='\\"_blank\\"'> 95341195<\\/a>\\n <label> Masterstyle-ID<\\/label>\\n52826321\\n <label> Digistyle-ID<\\/label>\\n12709620\\n <label> Ø Media dei voti<\\/label>\\n4.45 <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""> \\n <label> Lunghezza<\\/label>\\nGiusto\\n <label> Larghezza<\\/label>\\nGiusto\\n <label> Disponibilità<\\/label>\\n\\n(37)\\n\\n <\\/div>\\n <\\/div>\\n\\n <div class='\\"span5\\"'> \\n <div class='\\"label' full-height\\"="" id='\\"single-review-panel\\"'> \\n <span class='\\"panel-headline\\"'> Dati cliente<\\/span>\\n <hr/> \\n <table class='\\"customer-info-table\\"'> \\n <tr> \\n <td> \\n <label> Nome<\\/label>\\n nome\\n <\\/td>\\n <td> \\n <label> Cognome<\\/label>\\n cognome\\n <\\/td>\\n <\\/tr>\\n <tr> \\n <td> \\n <label> Codice cliente<\\/label>\\n N/A\\n <\\/td>\\n <td> \\n <label> Indirizzo e-mail<\\/label>\\n ********@gmail.com\\n <\\/td>\\n <\\/tr>\\n<\\/table>\\n\\n <span class='\\"panel-headline\\"'> Commento articolo<\\/span>\\n <hr/> \\n <i class='\\"rating' r5\\"=""> <\\/i> <br/> \\n\\n <textarea id='\\"review-text\\"' name='\\"text\\"' readonly='\\"readonly\\"' rows='\\"12\\"'>\\nBel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci ei rotolini.<\\/textarea>\\n\\n<span class='\\"panel-headline\\"'>Commenti sulla vestibilità<\\/span>\\n<hr/>\\n<table class='\\"size-info-table\\"'>\\n <tr>\\n <td>\\n <label>Lunghezza<\\/label>\\n Giusto\\n <\\/td>\\n <td>\\n <label>Larghezza<\\/label>\\n Giusto\\n <\\/td>\\n <td>\\n <label>Taglia<\\/label>\\n 62/64\\n <\\/td>\\n <td>\\n <label>Varianti<\\/label>\\n \\n <\\/td>\\n <td>\\n <label>Statura<\\/label>\\n 165-169\\n <\\/td>\\n <\\/tr>\\n<\\/table>\\n<p>\\n <table class='\\"table\\"'>\\n <tr>\\n <td>\\n <b>Rezensions-ID:<\\/b>\\n <span id='\\"review-id\\"'>10613555<\\/span>\\n <\\/td>\\n <td>\\n <b>Creata:<\\/b>\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <tr>\\n <td>\\n <b>Letzte Änderung<\\/b>\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n <\\/span>\\n <\\/td>\\n <td>\\n <b>di<\\/b>\\n Kunde\\n <\\/td>\\n <\\/tr>\\n <tr>\\n <td>\\n <b>Data pubblicazione:<\\/b>\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n <\\/span>\\n <\\/td>\\n <\\/tr>\\n <\\/table>\\n<\\/p>\\n\\n <\\/div>\\n <\\/div>\\n\\n <div class='\\"span4\\"'>\\n <div class='\\"label' full-height\\"="" id='\\"editing-functions-panel\\"'>\\n <span class='\\"panel-headline\\"'>Modifica<\\/span>\\n<hr/>\\n<div>\\n <label>Scegli un destinatario<\\/label>\\n <a class='\\"btn-link\\"' false;\\"="" href='\\"#\\"' id='\\"reset-recipients-list-link\\"' onclick='\\"reviews.resetRecipientsList(true);' return="">Cancella la lista destinatari<\\/a>\\n <select id='\\"email-recipients-select\\"' name='\\"email-recipients-select\\"'><option value='\\"\\"'><\\/option>\\n<option value='\\"*****@*****.it\\"'>servizio@******.it<\\/option><\\/select>\\n <textarea id='\\"email-recipients-textarea\\"' name='\\"email-recipients-textarea\\"'>\\n<\\/textarea>\\n <a class='\\"btn\\"' data-confirm-translation-modified-text='\\"Die' false;\\"="" gespeichert.="" href='\\"#\\"' id='\\"send-mail-btn\\"' nicht="" noch="" onclick='\\"reviews.sendMail(true);' return="" rezension="" trotzdem="" versenden?\\"="" wurde="" übersetzung="">Invia recensione<\\/a>\\n <label>Traduci<\\/label>\\n <textarea id='\\"review-uebersetzung\\"' name='\\"text\\"'>\\n<\\/textarea>\\n <label>Feedback al cliente<\\/label>\\n <textarea id='\\"review-feedbackToCustomer\\"' name='\\"text\\"'>\\n<\\/textarea>\\n<\\/div>\\n<div>\\n <label>Tipo di recensione<\\/label>\\n <select id='\\"review-meinungstyp\\"' name='\\"meinungstyp\\"'><option selected='\\"selected\\"' value='\\"R\\"'>Recensione<\\/option>\\n<option value='\\"G\\"'>Risposte<\\/option>\\n<option value='\\"A\\"'>Archivio<\\/option><\\/select>\\n<\\/div>\\n<div id='\\"aktiv-checkboxes-container\\"'>\\n <div class='\\"control-group' use-bootstrap-groups\\"="">\\n <label class='\\"control-label\\"' for='\\"review_aktiv\\"'>Pubblicata<\\/label>\\n <input id='\\"review_aktiv\\"' name='\\"review_aktiv\\"' type='\\"hidden\\"' value='\\"T\\"'/>\\n <div class='\\"controls\\"'>\\n <div class='\\"btn-group\\"'>\\n <a btn="" btn-success\\"="" class='\\"change-active-state' data-value='\\"T\\"' href='\\"#\\"'>Sì<\\/a>\\n <a \\"="" btn="" class='\\"change-active-state' data-value='\\"F\\"' href='\\"#\\"'>No<\\/a>\\n <\\/div>\\n <\\/div>\\n <\\/div>\\n<\\/div>\\n\\n<div class='\\"row-fluid' form-actions="" possible-multi-line\\"="">\\n <a btn-primary\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' id='\\"save-review-btn\\"' onclick='\\"reviews.saveReview(true);' remote='\\"true\\"' return="">Salva recensione<\\/a>\\n <a btn-danger\\"="" class='\\"btn' data-confirm-dialog-title='\\"Cancella' false;\\"="" href='\\"#\\"' id='\\"delete-review-btn\\"' onclick='\\"reviews.deleteSelectedReview(true);' recensioni\\"="" remote='\\"true\\"' return=""><i class="\\'icon-trash" icon-white\\'=""><\\/i> Cancella recensioni<\\/a>\\n<\\/div>\\n\\n <\\/div>\\n <\\/div>\\n <\\/div>\\n <\\/div>\\n<\\/div>\\n').trigger('repaint'); reviews.initEditReviewTab(); $('#reviews-tab-navigation').tabs('option', 'active', 0); $('.search-tab-buttons').html('<div class='\\"search-tab-buttons\\"'>\\n <table>\\n <tr>\\n <td><a btn-primary\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.submitSearchReviews();' remote='\\"true\\"' return="">Cerca<\\/a><\\/td>\\n <td><a btn-default\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.setDefaultSearchParams();' remote='\\"true\\"' return="">Ricerca standard<\\/a><\\/td>\\n <td><a btn-default\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.showStatistics(true);' remote='\\"true\\"' return="">Statistiche<\\/a><\\/td>\\n <\\/tr>\\n <\\/table>\\n<\\/div>'); $('.mini-statistics').replaceWith(' <div class='\\"mini-statistics\\"'>\\n <p>\\n Da controllare: 100 / Pubblicata: 304316 / Non pubblicata: 9207 / Prenotate: [0], mie: [0]\\n <\\/p>\\n <\\/div>\\n'); </p></div></a></td></a></td></a></td></tr></table></div></i></a></a></div></a></a></div></div></label></div></div></option></option></option></select></label></div></textarea></label></textarea></label></a></textarea></option></option></select></a></label></div></span></div></div></span></b></td></tr></b></td></span></b></td></tr></span></b></td></span></b></td></tr></table></p></label></td></label></td></label></td></label></td></label></td></tr></table></span></textarea> </i> </span> </label> </td> </label> </td> </tr> </label> </td> </label> </td> </tr> </table> </span> </div> </div> </label> </label> </label> </img> </label> </label> </label> </a> </label> </a> </span> </div> </div> </div> </div> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </tbody> </span> </th> </tr> </thead> </table> </span> </div> </div> </div>
Basically I would like to extract the number after each "data-review-id" (in this portion of html there are 5: 10613555, 10610141, 10575319, 10554514, 9469234) but I don't understand which tags I should select to get the result I want. 基本上,我想在每个“ data-review-id”之后提取数字(在html的此部分中有5:10613555、10610141、10575319、10554514、9495234),但我不明白应该选择哪个标签我想要的结果。
I've tried several combinations of soup.find_all but without any result. 我已经尝试了soup.find_all的几种组合,但没有任何结果。
Any help or suggestion would be really appreciated. 任何帮助或建议,将不胜感激。
Thanks in advance! 提前致谢!
The HTML you have is inside some Javascript and appears to have been escaped. 您拥有的HTML在某些Javascript内,并且似乎已被转义。 Copy/pasting the exact HTML you have given and assigning it to html
, the following could be used: 复制/粘贴您提供的确切HTML,并将其分配给html
,可以使用以下内容:
from bs4 import BeautifulSoup
html = """ ---- add HTML here ---"""
html = html.replace('"', ''). replace(r'\/', '/')
soup = BeautifulSoup(html, "html.parser")
for td in soup.find_all('td', {'data-review-id':True}):
print td['data-review-id']
This then displays: 然后显示:
10613555
10610141
10575319
10554514
9469234
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.