简体   繁体   English

在 div 内部被刮掉,但不知道如何将元素拆分为列表项,以便它们出现在生成的 csv 中的新行中

[英]Scraped inside of div but no clue how to split the elements as list item so that they appear in a new line in the csv generated

I am trying to scrape the text inside <a class="speciality"></a> and the one inside the <div class="links"> <a href=""></a> </div> from the below block我正在尝试从<a class="speciality"></a><div class="links"> <a href=""></a> </div>抓取文本下面块

I managed to scrape the text inside the above mentoned elements , the thing is the elements inside <div class="links"> <a href=""></a> </div> i dont know how to separate the elements as a list item ... please guide.我设法刮掉了上面提到的元素内的文本,事情是<div class="links"> <a href=""></a> </div>的元素我不知道如何将元素分开一个列表项...请指导。 Below i have put the html i am trying to parse and then below is the code i used .. Also post a solution if possible to pull them all as single array下面我放了我试图解析的 html,然后下面是我使用的代码。

<div class="column-block" id="hematology">
<h3 class="panel-title names strong">
<a class="speciality" rel="hematology" href="https://www.lyfboat.com/hospitals/hematology-hospitals-and-costs/">
Hematology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/procedures/allogenic/">Allogenic Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/aplastic-anemia-treatment-in-india/">Aplastic Anemia</a><a target="_blank" href="https://www.lyfboat.com/procedures/autologous-for-multiple-lymphomas/">Autologous Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/blood-cancer-treatment-hospitals-costs-in-india/">Blood Cancer Treatment</a><a target="_blank" href="https://www.lyfboat.com/bone-marrow-transplant-hospitals-costs-india/">Bone Marrow Transplant (BMT)</a><a target="_blank" href="https://www.lyfboat.com/fanconis-anemia-treatment-in-india/">Fanconi Anemia</a><a target="_blank" href="https://www.lyfboat.com/leukemia-treatment-cost-hospitals-surgeons-in-india/">Leukemia Treatment</a><a target="_blank" href="https://www.lyfboat.com/lymphoma-treatment-costs-hospitals-surgeons-in-india/">Lymphoma Treatment</a><a target="_blank" href="https://www.lyfboat.com/multiple-sclerosis-treatment-in-india/">Multiple Sclerosis</a><a target="_blank" href="https://www.lyfboat.com/hospitals/myeloma-blood-cancer-hospitals-and-costs/">Myeloma Treatment</a><a target="_blank" href="https://www.lyfboat.com/hospitals/pediatric-bone-marrow-transplant-hospitals-and-costs/">Pediatric Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/sickle-cell-anemia-treatment-in-india/">Sickle Cell Disease</a><a target="_blank" href="https://www.lyfboat.com/hospitals/thalassemia-transplant-hospitals-and-costs/">Thalassemia Transplant</a> </div>
</div>
<div class="column-block" id="pediatric-cardiology">
<h3 class="panel-title names strong">
<a class="speciality" rel="pediatric-cardiology" href="https://www.lyfboat.com/hospitals/pediatric-cardiology-hospitals-and-costs/">
Pediatric Cardiology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/hospitals/arterial-switch-operation-truncus-arteriosis-hospitals-and-costs/">Arterial switch operation/ Truncus arteriosis</a><a target="_blank" href="https://www.lyfboat.com/asd-closure-cost-surgeon-hospitals-in-india/">Atrial Septal Defect Closure (ASD)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/atrioventricular-canal-defect-av-canal-hospitals-and-costs/">Atrioventricular Canal Defect</a><a target="_blank" href="https://www.lyfboat.com/hospitals/balloon-atrial-septostomy-hospitals-and-costs/">Balloon Atrial Septostomy</a><a target="_blank" href="https://www.lyfboat.com/hospitals/double-outlet-right-ventricle-dorv-hospitals-and-costs/">Double Outlet Right Ventricle</a><a target="_blank" href="https://www.lyfboat.com/hospitals/fontan-hospitals-and-costs/">Fontan</a><a target="_blank" href="https://www.lyfboat.com/hospitals/glenn-hospitals-and-costs/">Glenn Procedure</a><a target="_blank" href="https://www.lyfboat.com/procedures/patent-ductus-arteriosus-pda-device-closure/">Patent Ductus Arteriosus Device Closure Catheterization</a><a target="_blank" href="https://www.lyfboat.com/fallots-tetralogy-treatment-cost-hospitals-in-india/">Tetralogy of Fallot</a><a target="_blank" href="https://www.lyfboat.com/hospitals/total-anomalous-pulmonary-venous-connection-tapvc-hospitals-and-costs/">Total Anomalous Pulmonary Venous Connection</a><a target="_blank" href="https://www.lyfboat.com/hospitals/transposition-of-the-great-arteries-tga-hospitals-and-costs/">Transposition of the Great Arteries (TGA)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/valvuplasty-hospitals-and-costs/">Valvuplasty</a> </div>
</div>```

from bs4 import BeautifulSoup
import requests
import lxml
import pandas as pd

base_url = "https://www.lyfboat.com/procedures/"

page = requests.get(base_url)
if page.status_code == requests.codes.ok:
  bs = BeautifulSoup(page.text, 'lxml')

data = {
  "Department" : [],
  "Conditions" : []
}

containers = bs.findAll('div', class_='column-block')

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

print(data['Department'])
print(data['Conditions'])

Replace:代替:

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

With:和:

for department in containers:
    if(department.find('a')):
        data['Department'].append(department.find('a', {'class': 'speciality'}).text)
        links = department.find('div', {'class': 'links'})
        for link in links.find_all("a"):
             data['Conditions'].append(link.get_text())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM