简体   繁体   中英

Metaprogramming Python Script for e-mail Capture

How can I modify the code below to capture all e-mails instead of images:

import urllib2
import re
from os.path import basename
from urlparse import urlsplit

url = "URL WITH IMAGES"
urlContent = urllib2.urlopen(url).read()
# HTML image tag: <img src="url" alt="some_text"/>
imgUrls = re.findall('img .*?src="(.*?)"', urlContent)

# download all images
for imgUrl in imgUrls:
    try:
        imgData = urllib2.urlopen(imgUrl).read()
        fileName = basename(urlsplit(imgUrl)[2])
        output = open(fileName,'wb')
        output.write(imgData)
        output.close()
    except:
        pass

Need to get a directory from an array of websites. I'm using C++ to create code for Unix by calling the .py file multiple times and then appending it to an existing file each time.

Parsing/validating email address requires a strong regex. You can look for those on google. I am showing you a simple email address parsing regex.

emails = re.findall('([a-zA-Z0-9\.]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,3})', urlContent)

This is just a rudimentary example. You need to use a powerful one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM