繁体   English   中英

搜寻动态网站以获取元素<script tag> using BeautifulSoup and Selenium

[英]Scraping dynamic website to get elements in <script tag> using BeautifulSoup and Selenium

我正在尝试通过使用beautifulsoup和selenium抓取动态网站。 我想过滤并放入CSV的属性包含在<script>标记内。 我想提取包含在

脚本:window.IS24 = window.IS24 || {}; IS24.ssoAppName =“搜索”; IS24.applicationContext =“ /查找/错误报告器”; IS24.ab = {}; IS24.feature = {“ SEARCH_BY_TELEKOM_SPEED_ENABLED”:true,IS24.resultList = {angularDebugInfoEnabled:false,navigationBarUrl:“ /搜索/ ST / Haus-Kauf”,

  nextPage: "/Suche/S-T/P-2/Haus-Kauf?pagerReporting=true",

  searchUrl: "/Haus-Kauf",
  isMobile: false,
  isTablet: false,
  query:     
{"realEstateType":"HOUSE_BUY","otpEnabled":true,"sortingCode":0,"location":      
{"isGeoHierarchySearch":true,
Schulze","referrer":["RESULT_LIST_GROUPED"],"**attributes":[  
{"title":"Kaufpreis","value":"249.012,75 €"}, 
{"title":"Wohnfläche","value":"129,87 m²"},{"title":"Zimmer","value":"4"},
{"title":"Grundstück","value":"400 m²"}],"checkedAttributes":["Gäste-**

我不确定如何将最后的属性提取到CSV中。 您能帮我提供代码吗?

这是您可以使用beautifulSoup从标签中提取属性值的方法。

import urllib2
from bs4 import BeautifulSoup

req = urllib2.Request('http://website_to_grab_things_from.com')
response = urllib2.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
alltext = soup.getText()

#soup.findAll('TAGNAME', {'ATTR_NAME' :'ATTR_VALUE'})
result = soup.findAll('div', {'class' :'teaser-text'})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM