简体   繁体   English

亚马逊以编程方式获取产品 ASIN

[英]Amazon Getting Product ASIN programmatically

I'm trying to programmatically retrieve ASIN numbers for over 500+ books.我正在尝试以编程方式检索 500 多本书的 ASIN 编号。

example: Product Catch-22 by Joseph Heller Amazon URL: https://www.amazon.com/Catch-22-Joseph-Heller/dp/3866155239示例:Joseph Heller 的 Product Catch-22 亚马逊 URL: https : //www.amazon.com/Catch-22-Joseph-Heller/dp/3866155239

I can get the product numbers manually by searching for each product through a browser however that's not efficient.我可以通过浏览器搜索每个产品来手动获取产品编号,但这效率不高。 I would like to use an API or wget/curl at the worst case, but I'm hitting some stumbling blocks.我想在最坏的情况下使用 API 或 wget/curl,但我遇到了一些绊脚石。

The Amazon API is not exactly the easiest to use...(I've been hitting my head against the wall trying to get the Signature Request Hash correct with python to no avail..) Amazon API 并不是最容易使用的......(我一直在用头撞墙,试图用 python 使签名请求哈希正确无济于事......)

Then I thought googler may be another option however after 15 request (even with time.sleep(30) google locks me out for a few hours [coming from multiple IP sources as well]).然后我认为 googler 可能是另一种选择,但是在 15 个请求之后(即使有 time.sleep(30) google 也会将我锁定几个小时 [来自多个 IP 来源])。

How about bing...well they don't show any Amazon results via the API...which is really odd... bing 怎么样……好吧,他们没有通过 API 显示任何亚马逊结果……这真的很奇怪……

I tried writing my own Google Parser with wget but then I would have to import all that into BeautifulSoup and reparse...my sed and awk skills leave a lot to be desired...我尝试用 wget 编写自己的 Google 解析器,但随后我必须将所有这些导入 BeautifulSoup 并重新解析……我的 sed 和 awk 技能还有很多不足之处……

Basically...Has anyone come across an easier way of obtaining the ASIN number for a product programmatically?基本上......有没有人遇到过以编程方式获取产品ASIN编号的更简单方法?

https://isbndb.com/ charges for the API :( https://isbndb.com/ API 收费 :(

so...所以...

Went the Google Web Scrape Route走 Google 网页抓取路线

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import requests
import time


def get_amazon_link(book_title):
  url = 'https://www.google.com/search?q=amazon+novel+'+book_title
  print(url)

  url = Request(url)
  url.add_header('User-Agent', 'Mozilla/5.0')

  with urlopen(url) as f:
    data = f.readlines()

    page_soup = soup(str(data), 'html.parser')
    for line in page_soup.findAll('h3',{'class':'r'}):
      for item in line.findAll('a', href=True):
        item = item['href'].split('=')[1]
        item = item.split('&')[0]
        return item


def get_wiki_link(book_title):
  url = 'https://www.google.com/search?q=wiki+novel+'+book_title
  print(url)
  url = Request(url)
  url.add_header('User-Agent', 'Mozilla/5.0')

  with urlopen(url) as f:
    data = f.readlines()

    page_soup = soup(str(data), 'html.parser')

    for line in page_soup.findAll('h3',{'class':'r'}):
      for item in line.findAll('a', href=True):
        item = item['href'].split('=')[1]
        item = item.split('&')[0]
        return item


a = open('amazonbookslinks','w')
w = open('wikibooklinks','w')
with open('booklist') as b:
  books = b.readlines()

  for book in books:
    book_title = book.replace(' ','+')
    amazon_result = get_amazon_link(book_title)
    amazon_msg = book +'@'+ amazon_result
    a.write(amazon_msg + '\n')
    time.sleep(5)
    wiki_result = get_wiki_link(book_title)    
    wiki_msg = book +'@'+ wiki_result
    w.write(wiki_msg + '\n')
    time.sleep(5)
a.close()
w.close()

Not Pretty but it worked :)不漂亮,但它有效:)

According to Amazon's customer service page:根据亚马逊的客户服务页面:

https://www.amazon.co.uk/gp/help/customer/display.html?nodeId=898182 https://www.amazon.co.uk/gp/help/customer/display.html?nodeId=898182

ASIN stands for Amazon Standard Identification Number. ASIN 代表亚马逊标准识别号。 Almost every product on our site has its own ASIN, a unique code we use to identify it.我们网站上的几乎每个产品都有自己的 ASIN,这是我们用来识别它的唯一代码。 For books, the ASIN is the same as the ISBN number, but for all other products a new ASIN is created when the item is uploaded to our catalogue.对于图书,ASIN 与 ISBN 号相同,但对于所有其他产品,当商品上传到我们的目录时会创建一个新的 ASIN。

This means that for the book 'Catch 22', its ISBN-10 is 3866155239 .这意味着对于“Catch 22”一书,其 ISBN-10 为3866155239

I suggest that you use a website like https://isbndb.com/ to find ISBNs for books which will automatically give you the ASINs you are looking for.我建议您使用像https://isbndb.com/这样的网站来查找书籍的 ISBN,它会自动为您提供您正在寻找的 ASIN。 It also comes with a REST API which you can read about at https://isbndb.com/apidocs .它还带有一个 REST API,您可以在https://isbndb.com/apidocs 上阅读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM