hi i have a python script that is going to a website and searching for strings inside of certain tags and printing it. my screen will look like this after it prints it - textidontwant textiwanthere.com how can i search for the .com and print a number of characters before it to only get the textiwanthere.com to show up instead of all of it. here is my code -
import urllib.request
import re
import os
url = "http://www.throwawaymail.com/"
request = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
sourcecode = urllib.request.urlopen(request).read()
output = sourcecode.decode("utf-8")
findemail = re.findall('>(.*?)</span>', str(output))
print(findemail)
os.system("pause")
i want to search "findemail" for it i want to print the phamepracl@throwam.com but its different everytime but the length is the same this is what my console says -
['Toggle navigation', '', '', '', '', 'phamepracl@throwam.com']
Just print the last entry of the list
print(findemail)[-1]
You could also assign this value to findmail
if you don't want the other stuff
findemail = re.findall('>(.*?)</span>', str(output))[-1]
This worked for me:
import urllib.request
import re
import os
url = "http://www.throwawaymail.com/"
request = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
sourcecode = urllib.request.urlopen(request).read()
output = sourcecode.decode("utf-8")
findemail = re.findall('>(.*?)</span>', str(output))
print(findemail[-1])
This is my solution:
for i in findemail:
if i.find('.com')>=0:
print(i)
Output:
hudininona@throwam.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.