在开发Web刮板模块上做出pythonic决策

Question

This is a fairly high level question. 这是一个相当高水平的问题。 I have developed many different web scrapers that work in different websites. 我已经开发了许多在不同网站上均可使用的网页抓取工具。

I have many different versions of the functions named getName() and getAddress(). 我有许多不同版本的函数，分别名为getName（）和getAddress（）。

Is it pythonic/not terrible coding practice to do this in the function of a module? 在模块功能中执行此操作是否是pythonic /可怕的编码实践？ If this is bad to do, could someone give me a high level tip on how to manage this kind of library-of-scrapers? 如果这样做不好，有人可以给我有关如何管理这种刮板库的高级提示吗？

def universalNameAdressGrab(url):
   page = pullPage(url)
   if 'Tucson.com' in url:
       import tucsonScraper
       name = getName(page)     #this is the getName for Tucson
       address = getAddress(page)
   elif 'NewYork.com' in url:
       import newyorkScraper
       name = getName(page)   #this is the getName for NewYork
       address = getAddress(page)
   return {'name':name, 'address':address}

Answer 1

It is probably more pythonic to import everything at the top of the file. 将所有内容导入文件顶部可能更像pythonic。 After that you can reference the functions by module and remove a lot of duplicated code. 之后，您可以按模块引用功能并删除大量重复的代码。 You may run into issues with URL capitalization, so I would standardize that as well. 您可能会遇到URL大写的问题，因此我也将其标准化。 You could use urlparse for that. 您可以为此使用urlparse 。 I would consider something like the following more pythonic: 我会考虑以下类似的pythonic：

import tucsonScraper
import newyorkScraper

def universalNameAdressGrab(url):
    page = pullPage(url)
    scraper = None

    if 'Tucson.com' in url:
        scraper = tucsonScraper
    elif 'NewYork.com' in url:
        scraper = newyorkScraper
    else:
        raise Exception("No scraper found for url")

    return {'name': scraper.getName(page), 'address': scraper.getAddress(page)}

在开发Web刮板模块上做出pythonic决策

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-07-20 16:54:25

在开发Web刮板模块上做出pythonic决策

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-07-20 16:54:25

解决方案1
1 已采纳 2015-07-20 16:54:25