搜寻动态网站以获取元素<script tag> using BeautifulSoup and Selenium

Question

我正在尝试通过使用beautifulsoup和selenium抓取动态网站。 我想过滤并放入CSV的属性包含在<script>标记内。 我想提取包含在

脚本：window.IS24 = window.IS24 || {}; IS24.ssoAppName =“搜索”; IS24.applicationContext =“ /查找/错误报告器”; IS24.ab = {}; IS24.feature = {“ SEARCH_BY_TELEKOM_SPEED_ENABLED”：true，IS24.resultList = {angularDebugInfoEnabled：false，navigationBarUrl：“ /搜索/ ST / Haus-Kauf”，

  nextPage: "/Suche/S-T/P-2/Haus-Kauf?pagerReporting=true",

  searchUrl: "/Haus-Kauf",
  isMobile: false,
  isTablet: false,
  query:     
{"realEstateType":"HOUSE_BUY","otpEnabled":true,"sortingCode":0,"location":      
{"isGeoHierarchySearch":true,
Schulze","referrer":["RESULT_LIST_GROUPED"],"**attributes":[  
{"title":"Kaufpreis","value":"249.012,75 €"}, 
{"title":"Wohnfläche","value":"129,87 m²"},{"title":"Zimmer","value":"4"},
{"title":"Grundstück","value":"400 m²"}],"checkedAttributes":["Gäste-**

我不确定如何将最后的属性提取到CSV中。 您能帮我提供代码吗？

Answer 1

这是您可以使用beautifulSoup从标签中提取属性值的方法。

import urllib2
from bs4 import BeautifulSoup

req = urllib2.Request('http://website_to_grab_things_from.com')
response = urllib2.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
alltext = soup.getText()

#soup.findAll('TAGNAME', {'ATTR_NAME' :'ATTR_VALUE'})
result = soup.findAll('div', {'class' :'teaser-text'})

搜寻动态网站以获取元素<script tag> using BeautifulSoup and Selenium

问题描述

1 个解决方案

解决方案1
0 2016-10-13 09:27:58

搜寻动态网站以获取元素<script tag> using BeautifulSoup and Selenium

问题描述

1 个解决方案

解决方案1 0 2016-10-13 09:27:58

解决方案1
0 2016-10-13 09:27:58