[英]Python print csv column value before output of each result without repeating
I have a Python script that imports a list of url's from a CSV named list.csv, scrapes them and outputs any anchor text and href links found on each url from the csv:
(供参考,csv 中的 url 列表都在 A 列中)
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas
import csv
contents = []
with open('list.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
for link in soup.find_all('a'):
if len(link.text)>0:
print(url, link.text, '-', link.get('href'))
The output results look something like this where https://www.example.com/csv-url-one/ and https://www.example.com/csv-url-two/ are the url's in column A in the csv :
['https://www.example.com/csv-url-one/'] Creative - https://www.example.com/creative/
['https://www.example.com/csv-url-one/'] Web Design - https://www.example.com/web-design/
['https://www.example.com/csv-url-two/'] PPC - https://www.example.com/ppc/
['https://www.example.com/csv-url-two/'] SEO - https://www.example.com/seo/
The issue is i want the output results to look more like this ie not repeatedly print the url in the CSV before each result AND have a break after each line from the CSV:
['https://www.example.com/csv-url-one/']
Creative - https://www.example.com/creative/
Web Design - https://www.example.com/web-design/
['https://www.example.com/csv-url-two/']
PPC - https://www.example.com/ppc/
SEO - https://www.example.com/seo/
这可能吗?
谢谢
以下是否解决了您的问题?
for url in contents:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
print('\n','********',', '.join(url),'********','\n')
for link in soup.find_all('a'):
if len(link.text)>0:
print(link.text, '-', link.get('href'))
有可能的。
只需在print
末尾添加\n
。 \n
是换行符特殊字符。
for url in contents:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
for link in soup.find_all('a'):
if len(link.text)>0:
print(url, ('\n'), link.text, '-', link.get('href'), ('\n'),)
要在 url 之间添加分隔符,请在打印每个 url 之前添加一个\n
。
如果您只想打印具有有效链接的 url,即if len(link.text)>0:
,请使用 for 循环将有效链接保存到列表中,如果此列表不为空,则仅打印 url 和链接。
尝试这个:
for url in contents:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
valid_links = []
for link in soup.find_all('a'):
if len(link.text)>0:
valid_links .append(link.text)
if len (valid_links ):
print('\n', url)
for item in valid_links :
print(item.text, '-', item.get('href')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.