簡體   English   中英

如何使用python從HTML代碼中提取特定元素

[英]How can I extract a specific element from HTML code using python

我對HTML語言不是很自信,並且在使用Python解析HTML代碼的這一部分(print soup.prettify()的結果)時遇到了麻煩。

 $("#global-flash").html(""); $('#reviews-tab-navigation').trigger('repaint'); $('#edit-review-tab').html(' <div class='\\"row-fluid\\"'> \\n <div class='\\"span3\\"'> \\n <div class='\\"label' full-height="" id='\\"review-search-result-panel\\"' use-bootstrap-tables\\"=""> \\n <span class='\\"panel-headline\\"'> Rezensionsdaten&lt;\\/span&gt;\\n <hr/> \\n\\n <table class='\\"table' id='\\"review-search-result-list\\"' table-hover="" table-striped\\"=""> \\n <thead> \\n <tr> \\n <th> \\n <span class='\\"review-count\\"'> 5&lt;\\/span&gt;\\n\\n Rezensionen gefunden\\n &lt;\\/th&gt;\\n &lt;\\/tr&gt;\\n &lt;\\/thead&gt;\\n\\n <tbody> \\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10613555\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span aderisce="" anzi="" bene="" colore="" come="" difettucci="" e="" foto,="" i="" in="" morbidissima,="" non="" pelle,="" piacevole="" rotolini.\\"="" segnare="" senza="" stringe="" sulla="" title='\\"Bel'> Bel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci ei rotolini.&lt;\\/span&gt;\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10610141\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span title='\\"bella\\"'> bella&lt;\\/span&gt;\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10575319\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 4\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span buona="" morbido.\\"="" qualità-prezzo,="" rapporto="" tessuto="" title='\\"Buon' vestibilità,=""> Buon rapporto qualità-prezzo, buona vestibilità, tessuto morbido.&lt;\\/span&gt;\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"10554514\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span buon="" capo!="" giusto="" ottima="" peso\\"="" qualità,="" title='\\"Davvero' un=""> Davvero un buon capo! Ottima qualità, giusto peso&lt;\\/span&gt;\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr> \\n <td class='\\"selectable-review-entry\\"' data-mastertstyle-id='\\"\\"' data-review-id='\\"9469234\\"'> \\n <span btn-link="" btn-small="" class='\\"btn' review-list-link\\"=""> \\n 5\\n <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""/> \\n\\n <span ....="" altri="" anche="" bello="" colori="" e="" funzionale.="" in="" regolare.\\"="" taglia="" title='\\"Preso'> Preso anche in altri colori .... bello e funzionale. Taglia regolare.&lt;\\/span&gt;\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n &lt;\\/tbody&gt;\\n &lt;\\/table&gt;\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n\\n <div class='\\"span9\\"'> \\n <div class='\\"row-fluid\\"'> \\n <div class='\\"span3\\"'> \\n <div class='\\"label' full-height\\"="" id='\\"product-data-panel\\"'> \\n <span class='\\"panel-headline\\"'> Informazioni articolo&lt;\\/span&gt;\\n <hr/> \\n <a href='\\"https://www.bonprix.it/search.htm?qu=95341195\\"' target='\\"_blank\\"'> <img src="\\'http://image01.bonprix.de/bonprixbilder//assets/114x160/13050022.jpg\\'"/> &lt;\\/a&gt;\\n <label> N. art.&lt;\\/label&gt;\\n <a class='\\"btn-link\\"' href='\\"https://www.bonprix.it/search.htm?qu=95341195\\"' target='\\"_blank\\"'> 95341195&lt;\\/a&gt;\\n <label> Masterstyle-ID&lt;\\/label&gt;\\n52826321\\n <label> Digistyle-ID&lt;\\/label&gt;\\n12709620\\n <label> Ø Media dei voti&lt;\\/label&gt;\\n4.45 <img 2015\\"="" alt='\\"Bewertung' src='\\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\\"' stern=""> \\n <label> Lunghezza&lt;\\/label&gt;\\nGiusto\\n <label> Larghezza&lt;\\/label&gt;\\nGiusto\\n <label> Disponibilità&lt;\\/label&gt;\\n\\n(37)\\n\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n\\n <div class='\\"span5\\"'> \\n <div class='\\"label' full-height\\"="" id='\\"single-review-panel\\"'> \\n <span class='\\"panel-headline\\"'> Dati cliente&lt;\\/span&gt;\\n <hr/> \\n <table class='\\"customer-info-table\\"'> \\n <tr> \\n <td> \\n <label> Nome&lt;\\/label&gt;\\n nome\\n &lt;\\/td&gt;\\n <td> \\n <label> Cognome&lt;\\/label&gt;\\n cognome\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr> \\n <td> \\n <label> Codice cliente&lt;\\/label&gt;\\n N/A\\n &lt;\\/td&gt;\\n <td> \\n <label> Indirizzo e-mail&lt;\\/label&gt;\\n ********@gmail.com\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n&lt;\\/table&gt;\\n\\n <span class='\\"panel-headline\\"'> Commento articolo&lt;\\/span&gt;\\n <hr/> \\n <i class='\\"rating' r5\\"=""> &lt;\\/i&gt; <br/> \\n\\n <textarea id='\\"review-text\\"' name='\\"text\\"' readonly='\\"readonly\\"' rows='\\"12\\"'>\\nBel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci ei rotolini.&lt;\\/textarea&gt;\\n\\n<span class='\\"panel-headline\\"'>Commenti sulla vestibilità&lt;\\/span&gt;\\n<hr/>\\n<table class='\\"size-info-table\\"'>\\n <tr>\\n <td>\\n <label>Lunghezza&lt;\\/label&gt;\\n Giusto\\n &lt;\\/td&gt;\\n <td>\\n <label>Larghezza&lt;\\/label&gt;\\n Giusto\\n &lt;\\/td&gt;\\n <td>\\n <label>Taglia&lt;\\/label&gt;\\n 62/64\\n &lt;\\/td&gt;\\n <td>\\n <label>Varianti&lt;\\/label&gt;\\n  \\n &lt;\\/td&gt;\\n <td>\\n <label>Statura&lt;\\/label&gt;\\n 165-169\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n&lt;\\/table&gt;\\n<p>\\n <table class='\\"table\\"'>\\n <tr>\\n <td>\\n <b>Rezensions-ID:&lt;\\/b&gt;\\n <span id='\\"review-id\\"'>10613555&lt;\\/span&gt;\\n &lt;\\/td&gt;\\n <td>\\n <b>Creata:&lt;\\/b&gt;\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr>\\n <td>\\n <b>Letzte Änderung&lt;\\/b&gt;\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n <td>\\n <b>di&lt;\\/b&gt;\\n Kunde\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n <tr>\\n <td>\\n <b>Data pubblicazione:&lt;\\/b&gt;\\n <span class='\\"utc-date\\"'>\\n 01.10.2017 11:06:26\\n &lt;\\/span&gt;\\n &lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n &lt;\\/table&gt;\\n&lt;\\/p&gt;\\n\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n\\n <div class='\\"span4\\"'>\\n <div class='\\"label' full-height\\"="" id='\\"editing-functions-panel\\"'>\\n <span class='\\"panel-headline\\"'>Modifica&lt;\\/span&gt;\\n<hr/>\\n<div>\\n <label>Scegli un destinatario&lt;\\/label&gt;\\n <a class='\\"btn-link\\"' false;\\"="" href='\\"#\\"' id='\\"reset-recipients-list-link\\"' onclick='\\"reviews.resetRecipientsList(true);' return="">Cancella la lista destinatari&lt;\\/a&gt;\\n <select id='\\"email-recipients-select\\"' name='\\"email-recipients-select\\"'><option value='\\"\\"'>&lt;\\/option&gt;\\n<option value='\\"*****@*****.it\\"'>servizio@******.it&lt;\\/option&gt;&lt;\\/select&gt;\\n <textarea id='\\"email-recipients-textarea\\"' name='\\"email-recipients-textarea\\"'>\\n&lt;\\/textarea&gt;\\n <a class='\\"btn\\"' data-confirm-translation-modified-text='\\"Die' false;\\"="" gespeichert.="" href='\\"#\\"' id='\\"send-mail-btn\\"' nicht="" noch="" onclick='\\"reviews.sendMail(true);' return="" rezension="" trotzdem="" versenden?\\"="" wurde="" übersetzung="">Invia recensione&lt;\\/a&gt;\\n <label>Traduci&lt;\\/label&gt;\\n <textarea id='\\"review-uebersetzung\\"' name='\\"text\\"'>\\n&lt;\\/textarea&gt;\\n <label>Feedback al cliente&lt;\\/label&gt;\\n <textarea id='\\"review-feedbackToCustomer\\"' name='\\"text\\"'>\\n&lt;\\/textarea&gt;\\n&lt;\\/div&gt;\\n<div>\\n <label>Tipo di recensione&lt;\\/label&gt;\\n <select id='\\"review-meinungstyp\\"' name='\\"meinungstyp\\"'><option selected='\\"selected\\"' value='\\"R\\"'>Recensione&lt;\\/option&gt;\\n<option value='\\"G\\"'>Risposte&lt;\\/option&gt;\\n<option value='\\"A\\"'>Archivio&lt;\\/option&gt;&lt;\\/select&gt;\\n&lt;\\/div&gt;\\n<div id='\\"aktiv-checkboxes-container\\"'>\\n <div class='\\"control-group' use-bootstrap-groups\\"="">\\n <label class='\\"control-label\\"' for='\\"review_aktiv\\"'>Pubblicata&lt;\\/label&gt;\\n <input id='\\"review_aktiv\\"' name='\\"review_aktiv\\"' type='\\"hidden\\"' value='\\"T\\"'/>\\n <div class='\\"controls\\"'>\\n <div class='\\"btn-group\\"'>\\n <a btn="" btn-success\\"="" class='\\"change-active-state' data-value='\\"T\\"' href='\\"#\\"'>Sì&lt;\\/a&gt;\\n <a \\"="" btn="" class='\\"change-active-state' data-value='\\"F\\"' href='\\"#\\"'>No&lt;\\/a&gt;\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n&lt;\\/div&gt;\\n\\n<div class='\\"row-fluid' form-actions="" possible-multi-line\\"="">\\n <a btn-primary\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' id='\\"save-review-btn\\"' onclick='\\"reviews.saveReview(true);' remote='\\"true\\"' return="">Salva recensione&lt;\\/a&gt;\\n <a btn-danger\\"="" class='\\"btn' data-confirm-dialog-title='\\"Cancella' false;\\"="" href='\\"#\\"' id='\\"delete-review-btn\\"' onclick='\\"reviews.deleteSelectedReview(true);' recensioni\\"="" remote='\\"true\\"' return=""><i class="\\'icon-trash" icon-white\\'="">&lt;\\/i&gt; Cancella recensioni&lt;\\/a&gt;\\n&lt;\\/div&gt;\\n\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n &lt;\\/div&gt;\\n&lt;\\/div&gt;\\n').trigger('repaint'); reviews.initEditReviewTab(); $('#reviews-tab-navigation').tabs('option', 'active', 0); $('.search-tab-buttons').html('<div class='\\"search-tab-buttons\\"'>\\n <table>\\n <tr>\\n <td><a btn-primary\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.submitSearchReviews();' remote='\\"true\\"' return="">Cerca&lt;\\/a&gt;&lt;\\/td&gt;\\n <td><a btn-default\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.setDefaultSearchParams();' remote='\\"true\\"' return="">Ricerca standard&lt;\\/a&gt;&lt;\\/td&gt;\\n <td><a btn-default\\"="" class='\\"btn' false;\\"="" href='\\"#\\"' onclick='\\"reviews.showStatistics(true);' remote='\\"true\\"' return="">Statistiche&lt;\\/a&gt;&lt;\\/td&gt;\\n &lt;\\/tr&gt;\\n &lt;\\/table&gt;\\n&lt;\\/div&gt;'); $('.mini-statistics').replaceWith(' <div class='\\"mini-statistics\\"'>\\n <p>\\n Da controllare: 100 / Pubblicata: 304316 / Non pubblicata: 9207 / Prenotate: [0], mie: [0]\\n &lt;\\/p&gt;\\n &lt;\\/div&gt;\\n'); </p></div></a></td></a></td></a></td></tr></table></div></i></a></a></div></a></a></div></div></label></div></div></option></option></option></select></label></div></textarea></label></textarea></label></a></textarea></option></option></select></a></label></div></span></div></div></span></b></td></tr></b></td></span></b></td></tr></span></b></td></span></b></td></tr></table></p></label></td></label></td></label></td></label></td></label></td></tr></table></span></textarea> </i> </span> </label> </td> </label> </td> </tr> </label> </td> </label> </td> </tr> </table> </span> </div> </div> </label> </label> </label> </img> </label> </label> </label> </a> </label> </a> </span> </div> </div> </div> </div> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </span> </span> </td> </tr> </tbody> </span> </th> </tr> </thead> </table> </span> </div> </div> </div> 

基本上,我想在每個“ data-review-id”之后提取數字(在html的此部分中有5:10613555、10610141、10575319、10554514、9495234),但我不明白應該選擇哪個標簽我想要的結果。

我已經嘗試了soup.find_all的幾種組合,但沒有任何結果。

任何幫助或建議,將不勝感激。

提前致謝!

您擁有的HTML在某些Javascript內,並且似乎已被轉義。 復制/粘貼您提供的確切HTML,並將其分配給html ,可以使用以下內容:

from bs4 import BeautifulSoup

html = """ ---- add HTML here ---"""

html = html.replace('"', ''). replace(r'\/', '/')
soup = BeautifulSoup(html, "html.parser")

for td in soup.find_all('td', {'data-review-id':True}):
    print td['data-review-id']

然后顯示:

10613555
10610141
10575319
10554514
9469234

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM