简体   繁体   English

阅读CSV列并搜索Google中的每个条目

[英]Read column of CSV and search each entry in Google

I have a list of articles which don't contain URLs of the producer (the articles are all from one producer). 我有一个不包含生产者URL的文章列表(这些文章均来自一个生产者)。 I would like to add the URLs of the article to each article. 我想将文章的URL添加到每篇文章。

For now I only have this: 现在,我只有这个:

import csv

included_cols = [1]
content = list()

with open("preisliste.csv", "rb") as csvfile:
    reader = csv.reader(csvfile, delimiter=";", quotechar="|")
    for row in reader:
        content.append(list(row[i] for i in included_cols))

print(content)

The output looks like this: 输出看起来像这样:

[['Artikelbezeichnung'], ['VM-2N'], ['VM-8H'], ['VM-16H'], ['VM-24HC'], ['VM-4HC'], ['VM-400HDCP'], ['VM-4HN'], ['VM-3HN'], ['VM-214DT/220V'], ['VM-212DT/220V'], ['VM-8HN'],...]

Now I would like to search each String (for example "VM-2N") in Google an save the URL of a hit in a new column. 现在,我想在Google中搜索每个字符串(例如“ VM-2N”),然后将匹配的网址保存在新列中。

Is something like this possible? 这样的事情可能吗?

Each one of your Google searches would give you multiple results, so it not clear how you would want to add this as an extra column. 您的每一项Google搜索都会为您提供多个结果,因此尚不清楚您希望如何将其添加为额外的列。

The following shows how you could extract the first results for each of your search terms. 下面显示了如何提取每个搜索词的第一个结果。 It uses BeautifulSoup to help parse the returned HTML and writes the search term, the heading and the URL for the first page of results to a file: 它使用BeautifulSoup来帮助解析返回的HTML,并将搜索结果,结果首页的标题和URL写入文件:

from bs4 import BeautifulSoup
import csv
import requests
import urllib

with open("preisliste.csv", "r", newline="") as f_input:
    csv_reader = csv.reader(f_input, delimiter=";", quotechar="|")
    header = next(csv_reader)
    items = [row[1] for row in csv_reader]

with open("results.csv", "w", newline="") as f_output:
    csv_writer = csv.writer(f_output, delimiter=";")

    for item in items: 
        search_url = "https://www.google.de/search?&q={}".format(urllib.parse.quote_plus(item, safe='/'))
        google_request = requests.get(search_url)
        soup = BeautifulSoup(google_request.content, "html.parser")    

        for r in soup.find_all('h3', class_='r'):
            if r.find('a', href=True):
                csv_writer.writerow([item, r.a.text, r.a['href'][7:]])

This would give you an output file starting something like: 这将为您提供一个输出文件,如下所示:

VM-2N;VM-2N - Kramer Electronics;https://www.kramerav.com/Product/VM-2N&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFggUMAA&usg=AOvVaw0jAAJ88F8a7I3lxDu_MN5q
VM-2N;KRAMER VM-2N DISTRIBUTION AMPLIFIER AV, 1x2, 1x CVBS ...;https://www.canford.co.uk/Products/90-401_KRAMER-VM-2N-DISTRIBUTION-AMPLIFIER-AV-1x2-1x-CVBS-BNC-2x-audio-RCAphono230V-AC-50Hz&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFggaMAE&usg=AOvVaw0llPJeJ9wO6fxPJVLlfxGu
VM-2N;Kramer VM-2N 1x2 Audio/Video Distribution Amplifier VM-2N B&H;https://www.bhphotovideo.com/c/product/262082-REG/Kramer_VM_2N_VM_2N_1x2_Audio_Video_Distribution.html&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFgggMAI&usg=AOvVaw05q42rkuyyZX2DS__UY2Sv
VM-2N;Kramer VM-2N Amplifier - ProAV;https://www.proav.co.uk/vm-2n-amplifier&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFggmMAM&usg=AOvVaw2KoAR4OJhRT6wsIn5uZZEq
VM-2N;Kramer VM-2N - 1:2 Video Audio Distribution Amplifier - Ivojo;http://www.ivojo.co.uk/component.php%3Fpid%3DKramer_VM-2N&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFggrMAQ&usg=AOvVaw2A-Jjg0pxmKXz_TuOOExbz
VM-2N;Kramer VM-2N (VM2N) 1:2 Composite Video & Stereo Audio ...;https://www.tnpbroadcast.co.uk/kramer-vm-2n-vm2n-1-2-composite-video-stereo-audio-distribution-amplifier-p215&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFggwMAU&usg=AOvVaw1a8ST4dljVr326BsO4wmRa
VM-2N;Amazon.com: Kramer Electronics VM-2N 1:2 Composite/SDI Video ...;https://www.amazon.com/Kramer-Electronics-VM-2N-Composite-Distribution/dp/B001N4DFWM&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFgg1MAY&usg=AOvVaw2F2NZ7RoZvKN-ITCgld6l3
VM-2N;Kramer VM-2N distribution amplifier - VM-2N - Audio/Video Switch ...;https://www.cdw.com/shop/products/Kramer-VM-2N-distribution-amplifier/2539932.aspx&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFgg6MAc&usg=AOvVaw3gDb60PFLKCXHkTEoTym8u
VM-2N;Kramer VM2N 1x2 Composite SDI distribution Amp with audio | Full ...;http://www.fullcompass.com/prod/052437-Kramer-VM2N&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFgg_MAg&usg=AOvVaw0616aM6qPT5inaKTcs7xca
VM-2N;Kramer VM-2N 1:2 Composite Video & Stereo Audio Distribution ...;https://www.vartotechnologies.com/1_2_Composite_Video_Stereo_Audio_Dist_Amp_p/vm-2n.htm&sa=U&ved=0ahUKEwj8ruWbld_XAhXLyKQKHRwpBaoQFghEMAk&usg=AOvVaw1k3yzXHmkznJlmKED7K5-4
VM-8H;VM-8H - Kramer Electronics;https://www.kramerav.com/product/VM-8H&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggUMAA&usg=AOvVaw1uxXnx96PtvD0nXDdu9QTJ
VM-8H;1:8 HDMI Distribution Amplifier;https://k.kramerav.com/downloads/pdf/product/1/VM-8H.pdf&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggaMAE&usg=AOvVaw0IVNeBYAHxlIg_uVMrBZ2i
VM-8H;VM-8H (previously VM-8HDMI);https://k.kramerav.com/downloads/pdf/product/1/VM-8H%2520(previously%2520VM-8HDMI).pdf&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggfMAI&usg=AOvVaw089BRCWL-44VnWIXtd1YiZ
VM-8H;Kramer VM-8H 1:8 HDMI Distribution Amplifier VM-8H-NV B&H Photo;https://www.bhphotovideo.com/c/product/904931-REG/kramer_vm_8h_nv_1_8_hdmi_distribution.html&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggkMAM&usg=AOvVaw3jGZ4zLUW-Ac8IjFYNiM3z
VM-8H;Kramer VM-8H - 1:8 HDMI 1.4 Distribution Amplifier - Ivojo;http://www.ivojo.co.uk/component.php%3Fpid%3DKramer_VM-8H&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggrMAQ&usg=AOvVaw37wIQ71tX7EHk0D9ryy11f
VM-8H;Amazon.com: Kramer Electronics HDMI Splitter VM-8H: Electronics;https://www.amazon.com/Kramer-Electronics-HDMI-Splitter-VM-8H/dp/B0052VEB1G&sa=U&ved=0ahUKEwjtsf-bld_XAhUQzqQKHbzzDT8QFggxMAU&usg=AOvVaw0AV3RCtqSezGdt9LiCp0iU

Note, you should not be accessing Google in this manner. 请注意,您不应以这种方式访问​​Google。 You should instead look into the API. 您应该改为查看API。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取 csv 将 \t 添加到每一列和值 - Read csv adds \t to each column and value 读入 csv 并根据每一列使用不同的分隔符? - Read in a csv with a different separator based on the each column? 如何将列表中的每个条目拆分()成新的CSV列 - How to split() each entry in a list into a new csv column 将CSV行解析为各列,并将每行中的第一个条目作为每列中的第一个条目 - Parsing a CSV row into columns with the first entry in each row as the first entry in each column 读取多个 CSV,每列都有其 CSV 名称 - Read multiple CSV an each column has its CSV name 将每个字典值与 csv 列条目匹配并将字典键应用于新列 - Match each dictionary value with csv column entry and apply dictionary key to new column 大熊猫递归read_csv,同时将列添加到每个 - pandas recursive read_csv while adding column to each 熊猫read_csv忽略每个值前面的列索引 - pandas read_csv to ignore the column index in front of each value Python 读取 100+ 个 CSV 文件并将每个 CSV 文件中的列总和返回到新的 Z628ZCB75675FFE8AFEB73F - Python read 100+ CSV files and return sum of a column from each CSV file into a new csv 从HTTP标头的每一列中指定read_csv中每一列的数据类型 - Specify the data type of each column in read_csv from each column of an HTTP header
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM