简体   繁体   中英

Python reads data from webpages

I have a list of a bunch of IP addresses. I am wondering if it is possible to use python to determine the country name of the IP addresses by extracting the information from this website ( http://www.whatip.com/ip-lookup ). Please see the screenshot below. eg: IPlist = ["100.43.90.10","125.7.8.9.9"]

Here is my code: I understand i could change the search url by concatenating the actual url with the suffix (=my IP address). And I want to get "United States"

Here is the screenshot of where "United States" is located: 在此处输入图片说明

    import urllib.request
    with urllib.request.urlopen('http://www.whatip.com/ip/100.43.90.10') as response:
        html = response.read()
        print (html)
        text = html.decode()                

        start = text.find("<td>Country:</td>")

I checked there is only one "Country" in the source code. I understand that I need to find the index of "Country", and then print out "United States" but I got stuck. Anyone plz tell me how to do it? Many thanks!!

You can use this site : http://whatismyipaddress.com/ip/

All you need to do is write a Python script. The Python Script will be making use of the urllib3 library. This library is used to create connections to the web, setup an array of IP Addresses and loop through them, each time appending the IP address to the above given site. Create a http request using urllib , once the response is received, you can use the .data property of response to get the response data. Once you receive the response data, use a simple regex for locating country field name, and then just grab the country name.

Just go through the urllib documentation, which is small ! and you're done !

ps I did a similar thing a month back !

I would suggest using one of the many REST APIs available for IP geolocation.

This doesn't require you to install any new modules or perform any web page scraping. The request returns a json object that you can use the inbuilt module to parse and immediately create a python dictionary.

I had a quick play with nekudo and it appear to work well:

import json
from http import client

# Connect to the client
conn = client.HTTPConnection("geoip.nekudo.com")

# Make the request and extract the data
conn.request("GET","/api/172.217.3.110/full")
json_data = conn.getresponse().read().decode()

# Convert the JSON to a Python object
data = json.loads(json_data)

data is now a Python dictionary containing all the information you need

>>> data['registered_country']['names']['en']
'United States'

>>> data['location']
{'latitude': 37.4192, 'metro_code': 807, 'time_zone': 'America/Los_Angeles', 'longitude': -122.0574}

I find it almost always easier to use an API than the screenscrape a web page. Here is one solution using ip-api.com:

import requests
import json

IPlist = ["100.43.90.10","125.7.8.9.9"]

request = json.dumps([{'query':ip, 'fields':'country'} for ip in IPlist])
response = requests.post('http://ip-api.com/batch', data=request).json()

print '\n'.join('{}: {}'.format(ip, data.get('country', 'Unknown'))
                for ip, data in zip(IPlist, response))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM