簡體   English   中英

在 div 內部被刮掉,但不知道如何將元素拆分為列表項,以便它們出現在生成的 csv 中的新行中

[英]Scraped inside of div but no clue how to split the elements as list item so that they appear in a new line in the csv generated

我正在嘗試從<a class="speciality"></a><div class="links"> <a href=""></a> </div>抓取文本下面塊

我設法刮掉了上面提到的元素內的文本,事情是<div class="links"> <a href=""></a> </div>的元素我不知道如何將元素分開一個列表項...請指導。 下面我放了我試圖解析的 html,然后下面是我使用的代碼。

<div class="column-block" id="hematology">
<h3 class="panel-title names strong">
<a class="speciality" rel="hematology" href="https://www.lyfboat.com/hospitals/hematology-hospitals-and-costs/">
Hematology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/procedures/allogenic/">Allogenic Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/aplastic-anemia-treatment-in-india/">Aplastic Anemia</a><a target="_blank" href="https://www.lyfboat.com/procedures/autologous-for-multiple-lymphomas/">Autologous Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/blood-cancer-treatment-hospitals-costs-in-india/">Blood Cancer Treatment</a><a target="_blank" href="https://www.lyfboat.com/bone-marrow-transplant-hospitals-costs-india/">Bone Marrow Transplant (BMT)</a><a target="_blank" href="https://www.lyfboat.com/fanconis-anemia-treatment-in-india/">Fanconi Anemia</a><a target="_blank" href="https://www.lyfboat.com/leukemia-treatment-cost-hospitals-surgeons-in-india/">Leukemia Treatment</a><a target="_blank" href="https://www.lyfboat.com/lymphoma-treatment-costs-hospitals-surgeons-in-india/">Lymphoma Treatment</a><a target="_blank" href="https://www.lyfboat.com/multiple-sclerosis-treatment-in-india/">Multiple Sclerosis</a><a target="_blank" href="https://www.lyfboat.com/hospitals/myeloma-blood-cancer-hospitals-and-costs/">Myeloma Treatment</a><a target="_blank" href="https://www.lyfboat.com/hospitals/pediatric-bone-marrow-transplant-hospitals-and-costs/">Pediatric Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/sickle-cell-anemia-treatment-in-india/">Sickle Cell Disease</a><a target="_blank" href="https://www.lyfboat.com/hospitals/thalassemia-transplant-hospitals-and-costs/">Thalassemia Transplant</a> </div>
</div>
<div class="column-block" id="pediatric-cardiology">
<h3 class="panel-title names strong">
<a class="speciality" rel="pediatric-cardiology" href="https://www.lyfboat.com/hospitals/pediatric-cardiology-hospitals-and-costs/">
Pediatric Cardiology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/hospitals/arterial-switch-operation-truncus-arteriosis-hospitals-and-costs/">Arterial switch operation/ Truncus arteriosis</a><a target="_blank" href="https://www.lyfboat.com/asd-closure-cost-surgeon-hospitals-in-india/">Atrial Septal Defect Closure (ASD)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/atrioventricular-canal-defect-av-canal-hospitals-and-costs/">Atrioventricular Canal Defect</a><a target="_blank" href="https://www.lyfboat.com/hospitals/balloon-atrial-septostomy-hospitals-and-costs/">Balloon Atrial Septostomy</a><a target="_blank" href="https://www.lyfboat.com/hospitals/double-outlet-right-ventricle-dorv-hospitals-and-costs/">Double Outlet Right Ventricle</a><a target="_blank" href="https://www.lyfboat.com/hospitals/fontan-hospitals-and-costs/">Fontan</a><a target="_blank" href="https://www.lyfboat.com/hospitals/glenn-hospitals-and-costs/">Glenn Procedure</a><a target="_blank" href="https://www.lyfboat.com/procedures/patent-ductus-arteriosus-pda-device-closure/">Patent Ductus Arteriosus Device Closure Catheterization</a><a target="_blank" href="https://www.lyfboat.com/fallots-tetralogy-treatment-cost-hospitals-in-india/">Tetralogy of Fallot</a><a target="_blank" href="https://www.lyfboat.com/hospitals/total-anomalous-pulmonary-venous-connection-tapvc-hospitals-and-costs/">Total Anomalous Pulmonary Venous Connection</a><a target="_blank" href="https://www.lyfboat.com/hospitals/transposition-of-the-great-arteries-tga-hospitals-and-costs/">Transposition of the Great Arteries (TGA)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/valvuplasty-hospitals-and-costs/">Valvuplasty</a> </div>
</div>```

from bs4 import BeautifulSoup
import requests
import lxml
import pandas as pd

base_url = "https://www.lyfboat.com/procedures/"

page = requests.get(base_url)
if page.status_code == requests.codes.ok:
  bs = BeautifulSoup(page.text, 'lxml')

data = {
  "Department" : [],
  "Conditions" : []
}

containers = bs.findAll('div', class_='column-block')

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

print(data['Department'])
print(data['Conditions'])

代替:

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

和:

for department in containers:
    if(department.find('a')):
        data['Department'].append(department.find('a', {'class': 'speciality'}).text)
        links = department.find('div', {'class': 'links'})
        for link in links.find_all("a"):
             data['Conditions'].append(link.get_text())

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM