繁体   English   中英

使用 Python 从 javascript 中提取数据

[英]extract data from javascript using Python

我是 Python 的新用户,我从我的前任那里继承了一个 Python 笔记本,我想改进它。 它的目的是从网站上获取产品详细信息。

它是如何工作的:

  • 它使用漂亮的汤从网站上抓取脚本:

     source = urllib2.urlopen('http://www.testwebsite.html').read() soup = bs4.BeautifulSoup(source) job_postings = soup.findAll("script") job_postings = [jp for jp in job_postings if not jp.get('type') is None and ''.join(jp.get('type')) =="text/javascript" and ''.join(jp.get('type')) =="text/javascript"]

它返回网页中的所有脚本:(数据的第一部分)

window.wf=window.wf||{};wf.appData=wf.appData||{};wf.appData.product_data_TEST123=wf.appData.product_data_TEST123||{};wf.appData.product_data_TEST123 = {"sku" :"TES123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"Supplier1","product_name":"product test","part_number":"1234","list_price":1000," is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty":1, "display_set_quantity":1,"is_standard_layout":true,"page_type":"ProductPage"};Y_config.app.product_data_TEST123 = {"sku":"TEST123",........这里的信息相同...... .};

2 sd 部分数据:

\\n wf.extend({"YUI_config":{"app":{"pageAlias":"ProductPage"}},"wf":{"appData":{"pageAlias":"ProductPage",,"mkcName": "AU: FurnitureRoom","productReviews":{"b_show_review_tags":false,"kit_subgroup_price":null,"catalog_currency":"AUD","price_model":null,"colors":"",,"available_after":{ "date":"2016-07-28 18:05:16.000000","timezone":"Australia\\\\/Sydney"},"inventory_info":{"sku":"TEST123",,"latest_inventory_update":"2016 -07-29 00:45:06","option_ids":[],"available_quantity":17,"display_quantity":17,","quantity_available_string":"库存超过 10 个","short_lead_time_id":2, "short_lead_time_string":"1 到 3 个工作日内离开仓库"}}};

然后我提取我需要的数据:

   jsonfile =  re.findall(r'wf.appData.product_data_[A-Z]{4}[0-9]{4} = (\{.*});YUI_config.app.product_data_',str(job_postings))

我有这个:

{"sku":"TEST123","is_grid_view":false,,"default_img_display":0,"manufacturer_name":"Supplier1","product_name":"product test","part_number":"1234","list_price" :1000,"is_price_hidden":false,"base_price":1000,"has_opt":true,"opt_details":[{"option_ids":[],"regular_price":2681.25],"has_free_shipping":false,,"total_qty ":1,"display_set_quantity":1,"is_standard_layout":true,"page_type":"ProductPage"}

我现在的问题是:我想将“inventory_info”列表添加到我的数据中

我试过:

     jsonfile =  re.findall(r'inventory_info' = (\{.*}),str(job_postings))

    Jsonfile = re.compile('inventory_info' = ({.*?});', re.DOTALL)

这些都不起作用。

我对 Python 的了解非常有限,所以我现在有点迷茫。 感谢您的帮助。

您可能已经找到了问题的答案,但无论如何都在这里。

为了获得inventory_info ,你总是可以做一个拆分(假设 job_postings 被转换为类型string ),如下所示:

inventory_info = job_postings.split("inventory_info:")[1].split("}")[0] + "}"

job_postings += inventory_info

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM