简体   繁体   中英

Amazon Getting Product ASIN programmatically

I'm trying to programmatically retrieve ASIN numbers for over 500+ books.

example: Product Catch-22 by Joseph Heller Amazon URL: https://www.amazon.com/Catch-22-Joseph-Heller/dp/3866155239

I can get the product numbers manually by searching for each product through a browser however that's not efficient. I would like to use an API or wget/curl at the worst case, but I'm hitting some stumbling blocks.

The Amazon API is not exactly the easiest to use...(I've been hitting my head against the wall trying to get the Signature Request Hash correct with python to no avail..)

Then I thought googler may be another option however after 15 request (even with time.sleep(30) google locks me out for a few hours [coming from multiple IP sources as well]).

How about bing...well they don't show any Amazon results via the API...which is really odd...

I tried writing my own Google Parser with wget but then I would have to import all that into BeautifulSoup and reparse...my sed and awk skills leave a lot to be desired...

Basically...Has anyone come across an easier way of obtaining the ASIN number for a product programmatically?

https://isbndb.com/ charges for the API :(

so...

Went the Google Web Scrape Route

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import requests
import time


def get_amazon_link(book_title):
  url = 'https://www.google.com/search?q=amazon+novel+'+book_title
  print(url)

  url = Request(url)
  url.add_header('User-Agent', 'Mozilla/5.0')

  with urlopen(url) as f:
    data = f.readlines()

    page_soup = soup(str(data), 'html.parser')
    for line in page_soup.findAll('h3',{'class':'r'}):
      for item in line.findAll('a', href=True):
        item = item['href'].split('=')[1]
        item = item.split('&')[0]
        return item


def get_wiki_link(book_title):
  url = 'https://www.google.com/search?q=wiki+novel+'+book_title
  print(url)
  url = Request(url)
  url.add_header('User-Agent', 'Mozilla/5.0')

  with urlopen(url) as f:
    data = f.readlines()

    page_soup = soup(str(data), 'html.parser')

    for line in page_soup.findAll('h3',{'class':'r'}):
      for item in line.findAll('a', href=True):
        item = item['href'].split('=')[1]
        item = item.split('&')[0]
        return item


a = open('amazonbookslinks','w')
w = open('wikibooklinks','w')
with open('booklist') as b:
  books = b.readlines()

  for book in books:
    book_title = book.replace(' ','+')
    amazon_result = get_amazon_link(book_title)
    amazon_msg = book +'@'+ amazon_result
    a.write(amazon_msg + '\n')
    time.sleep(5)
    wiki_result = get_wiki_link(book_title)    
    wiki_msg = book +'@'+ wiki_result
    w.write(wiki_msg + '\n')
    time.sleep(5)
a.close()
w.close()

Not Pretty but it worked :)

According to Amazon's customer service page:

https://www.amazon.co.uk/gp/help/customer/display.html?nodeId=898182

ASIN stands for Amazon Standard Identification Number. Almost every product on our site has its own ASIN, a unique code we use to identify it. For books, the ASIN is the same as the ISBN number, but for all other products a new ASIN is created when the item is uploaded to our catalogue.

This means that for the book 'Catch 22', its ISBN-10 is 3866155239 .

I suggest that you use a website like https://isbndb.com/ to find ISBNs for books which will automatically give you the ASINs you are looking for. It also comes with a REST API which you can read about at https://isbndb.com/apidocs .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM