I am running the below code in order to scrape from Google, but I get this error code when I try to run from terminal
File "Coordinate-Scraper.py", line 33, in <module>
for loc in locations_array:
TypeError: iteration over a 0-d array
I keep trying to figure out what may be causing it. Any ideas? I have added in a part of the code to account for Google breaking after several hundred observations, but it still refuses to run.
import pandas as pd
import numpy as np
import requests
import csv
import sys
import json
from bs4 import BeautifulSoup
Locations_file1 = 'Locations_file1.csv'
Locations_Sheet = 'Sheet1'
Col_name = 'Locations'
try:
if Locations_file1.split(".")[1] == 'csv':
locations_df = pd.read_excel(Locations_file1, sheetname=Locations_Sheet)
locations_array = np.asarray(locations_df[Col_name])
elif Locations_file1.split(".")[1] == 'csv':
locations_df = pd.read_csv(Locations_file1)
locations_array = locations_df[Col_name]
except:
locations_array = np.asarray(Locations_file1)
features = ['Location', 'Latitude', 'Longitude']
Complete_df = pd.DataFrame(columns=features)
c = 0
for loc in locations_array:
if c < 223:
c+=1
continue
desired_location = loc
search_url = 'https://www.google.com/search?q='
url = search_url + str(desired_location.replace(' ','+'))
r = requests.get(url)
# print(json.dumps(r.text))
content = r.text
# content = r.content
soup = BeautifulSoup(content, features="html.parser")
body = soup.find('body')
# print(body)
# break
map_class = body.find('a',href=lambda href: href and "maps" in href)
# map_class = body.find('a',,{'class' :'VGHMXd'})
# print(map_class)
map_url = map_class.get('href')
r_map = requests.get(map_url)
content_map = r_map.text
soup_map = BeautifulSoup(content_map, features="html.parser")
head = soup_map.find('head')
url_long_lat = head.find_all('meta')[8].get('content')
Lat, Long = url_long_lat[url_long_lat.find('center=')+len('center='):url_long_lat.rfind('&zoom')].split('%2C')
location_info = pd.DataFrame([[desired_location,Lat,Long]])
location_info.columns = features
Complete_df = Complete_df.append(location_info, ignore_index = True)
print(Complete_df)
Complete_df.to_csv('Locations_Latitude_Longitude.csv')
You have 2 issues there.
I believe your if statement: if Locations_file1.split(".")[1] == 'csv':
should be checking for 'xlsx'
or other spreadshit format.
You have that error because you are getting an exception in your try
block. In except
block, you are trying to convert the file name Locations_file1
into np array, which is resulting in locations_array
having only the value of Locations_file1
string. So if you print locations_array
you will get Locations_file1.csv on your console.
Interestingly, locations_array
is an object wich has __str__() function returning the value of the string, but the verabile is 0 dimensional array. So the datafeild of the np.array object is empty.
In simple terms
strs = "Hello world"
arr = np.asarray(strs)
print(arr.__str__())
print(arr)
print(arr.__repr__())
for i in arr:
print(i)
gives following output:
Hello world
Hello world
array('Hello wolrd', dtype='<U11')
Traceback (most recent call last):
File "/home/user/tests.py", line 73, in <module>
for i in arr:
TypeError: iteration over a 0-d array
Process finished with exit code 1
Update:
Now if to address the issue in your try:...except:
block, first of all, you are catching all possible exceptions there and ignoring them assuming that if reading the file didn't work, then it must be already read. But your code does not indicate that it's read in case of failure. I would suggest changing your except
block:
except Exception as e:
print(e)
This would give you an idea of what went wrong. I dare to assume two possible cases:
pandas.DataFrame
Col_name
which caries value "Locations"
in your code.I believe the error is in this part of the code:
try:
if Locations_file1.split(".")[1] == 'csv':
locations_df = pd.read_excel(Locations_file1, sheetname=Locations_Sheet)
locations_array = np.asarray(locations_df[Col_name])
elif Locations_file1.split(".")[1] == 'csv':
locations_df = pd.read_csv(Locations_file1)
locations_array = locations_df[Col_name]
except:
locations_array = np.asarray(Locations_file1)
Your if
and elif
statements are both triggered on the condition Locations_file1.split(".")[1] == 'csv'
. There is no else
statement so, if Locations_file1.split(".")[1]
does not equal 'csv'
, locations_array
is not getting populated. You make be expecting the except part of the code to catch all other cases, but that may not be triggering, because it's theoretically possible to have a case where your conditions are not true but no exception was thrown. Another error may be hiding behind this one, but I think that's the source here.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.