简体   繁体   中英

TypeError: iteration over a 0-d array with scraper?

I am running the below code in order to scrape from Google, but I get this error code when I try to run from terminal

 File "Coordinate-Scraper.py", line 33, in <module>
    for loc in locations_array:
TypeError: iteration over a 0-d array

I keep trying to figure out what may be causing it. Any ideas? I have added in a part of the code to account for Google breaking after several hundred observations, but it still refuses to run.

import pandas as pd
import numpy as np
import requests
import csv
import sys
import json

from bs4 import BeautifulSoup

Locations_file1 = 'Locations_file1.csv'
Locations_Sheet = 'Sheet1'
Col_name = 'Locations'

try:
    if Locations_file1.split(".")[1] == 'csv':
        locations_df = pd.read_excel(Locations_file1, sheetname=Locations_Sheet)
        locations_array = np.asarray(locations_df[Col_name])
    elif Locations_file1.split(".")[1] == 'csv':
        locations_df = pd.read_csv(Locations_file1)
        locations_array = locations_df[Col_name]
except:
    locations_array = np.asarray(Locations_file1)
    
features = ['Location', 'Latitude', 'Longitude']
Complete_df = pd.DataFrame(columns=features)
c = 0
for loc in locations_array:

    if c < 223:
        c+=1
        continue
    desired_location = loc
    search_url = 'https://www.google.com/search?q='
    url = search_url + str(desired_location.replace(' ','+'))
    r = requests.get(url)

    # print(json.dumps(r.text))
    content = r.text
    # content = r.content

    soup = BeautifulSoup(content, features="html.parser")
    body = soup.find('body')
    # print(body)
    # break
    map_class = body.find('a',href=lambda href: href and "maps" in href)
    # map_class = body.find('a',,{'class' :'VGHMXd'})
    # print(map_class)
    map_url = map_class.get('href')
    r_map = requests.get(map_url)
    content_map = r_map.text

    soup_map = BeautifulSoup(content_map, features="html.parser")
    head = soup_map.find('head')
    url_long_lat = head.find_all('meta')[8].get('content')
    Lat, Long = url_long_lat[url_long_lat.find('center=')+len('center='):url_long_lat.rfind('&zoom')].split('%2C')
    location_info = pd.DataFrame([[desired_location,Lat,Long]])
    location_info.columns = features
    Complete_df = Complete_df.append(location_info, ignore_index = True)
    print(Complete_df)
    Complete_df.to_csv('Locations_Latitude_Longitude.csv')
           

You have 2 issues there.

  1. I believe your if statement: if Locations_file1.split(".")[1] == 'csv': should be checking for 'xlsx' or other spreadshit format.

  2. You have that error because you are getting an exception in your try block. In except block, you are trying to convert the file name Locations_file1 into np array, which is resulting in locations_array having only the value of Locations_file1 string. So if you print locations_array you will get Locations_file1.csv on your console.

Interestingly, locations_array is an object wich has __str__() function returning the value of the string, but the verabile is 0 dimensional array. So the datafeild of the np.array object is empty.

In simple terms

strs = "Hello world"

arr = np.asarray(strs)
print(arr.__str__())
print(arr)
print(arr.__repr__())
for i in arr:
    print(i)

gives following output:

Hello world
Hello world
array('Hello wolrd', dtype='<U11')
Traceback (most recent call last):
  File "/home/user/tests.py", line 73, in <module>
    for i in arr:
TypeError: iteration over a 0-d array

Process finished with exit code 1

Update:

Now if to address the issue in your try:...except: block, first of all, you are catching all possible exceptions there and ignoring them assuming that if reading the file didn't work, then it must be already read. But your code does not indicate that it's read in case of failure. I would suggest changing your except block:

except Exception as e:
    print(e)

This would give you an idea of what went wrong. I dare to assume two possible cases:

  1. Your file is bad and you fail to read it to pandas.DataFrame
  2. Your file and eventually your DataFrame does not contain Col_name which caries value "Locations" in your code.

I believe the error is in this part of the code:

try:
    if Locations_file1.split(".")[1] == 'csv':
        locations_df = pd.read_excel(Locations_file1, sheetname=Locations_Sheet)
        locations_array = np.asarray(locations_df[Col_name])
    elif Locations_file1.split(".")[1] == 'csv':
        locations_df = pd.read_csv(Locations_file1)
        locations_array = locations_df[Col_name]
except:
    locations_array = np.asarray(Locations_file1)

Your if and elif statements are both triggered on the condition Locations_file1.split(".")[1] == 'csv' . There is no else statement so, if Locations_file1.split(".")[1] does not equal 'csv' , locations_array is not getting populated. You make be expecting the except part of the code to catch all other cases, but that may not be triggering, because it's theoretically possible to have a case where your conditions are not true but no exception was thrown. Another error may be hiding behind this one, but I think that's the source here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM