简体   繁体   English

在开发Web刮板模块上做出pythonic决策

[英]Making a pythonic decision on developing a web scraper module

This is a fairly high level question. 这是一个相当高水平的问题。 I have developed many different web scrapers that work in different websites. 我已经开发了许多在不同网站上均可使用的网页抓取工具。

I have many different versions of the functions named getName() and getAddress(). 我有许多不同版本的函数,分别名为getName()和getAddress()。

Is it pythonic/not terrible coding practice to do this in the function of a module? 在模块功能中执行此操作是否是pythonic /可怕的编码实践? If this is bad to do, could someone give me a high level tip on how to manage this kind of library-of-scrapers? 如果这样做不好,有人可以给我有关如何管理这种刮板库的高级提示吗?

def universalNameAdressGrab(url):
   page = pullPage(url)
   if 'Tucson.com' in url:
       import tucsonScraper
       name = getName(page)     #this is the getName for Tucson
       address = getAddress(page)
   elif 'NewYork.com' in url:
       import newyorkScraper
       name = getName(page)   #this is the getName for NewYork
       address = getAddress(page)
   return {'name':name, 'address':address}

It is probably more pythonic to import everything at the top of the file. 将所有内容导入文件顶部可能更像pythonic。 After that you can reference the functions by module and remove a lot of duplicated code. 之后,您可以按模块引用功能并删除大量重复的代码。 You may run into issues with URL capitalization, so I would standardize that as well. 您可能会遇到URL大写的问题,因此我也将其标准化。 You could use urlparse for that. 您可以为此使用urlparse I would consider something like the following more pythonic: 我会考虑以下类似的pythonic:

import tucsonScraper
import newyorkScraper

def universalNameAdressGrab(url):
    page = pullPage(url)
    scraper = None

    if 'Tucson.com' in url:
        scraper = tucsonScraper
    elif 'NewYork.com' in url:
        scraper = newyorkScraper
    else:
        raise Exception("No scraper found for url")

    return {'name': scraper.getName(page), 'address': scraper.getAddress(page)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM