[英]Pull data using regex and insert into a .csv file
So I am using regex to pull data from a webpage. 所以我正在使用正则表达式从网页中提取数据。 Done.
做完了
Now I am trying to insert this data into a .csv file. 现在,我正在尝试将此数据插入.csv文件。 No problem right?
没问题吧?
So I am having trouble pulling my data from the loops I created to insert into the .csv file. 因此,我无法从我创建的要插入.csv文件的循环中提取数据。 It looks like the best way to conquer this is to create a list, and somehow insert the data into the list and write the data into the csv file.
似乎最好的解决方法是创建一个列表,然后以某种方式将数据插入列表中并将数据写入csv文件。 But how can I do that with my current setup?
但是如何使用当前设置进行操作?
import re
import sqlite3 as lite
import mysql.connector
import urllib.request
from bs4 import BeautifulSoup
import csv
#We're pulling info on socks from e-commerce site Aliexpress
url="https://www.aliexpress.com/premium/socks.html?SearchText=socks<ype=wholesale&d=y&tc=ppc&blanktest=0&initiative_id=SB_20171202125044&origin=y&catId=0&isViewCP=y"
req = urllib.request.urlopen(url)
soup = BeautifulSoup(req, "html.parser")
div = soup.find_all("div", attrs={"class":"item"})
for item in div:
title_pattern = '<img alt="(.*?)\"'
comp = re.compile(title_pattern)
href = re.findall(comp, str(item))
for x in href:
print(x)
price_pattern = 'itemprop="price">(.*?)<'
comp = re.compile(price_pattern)
href = re.findall(comp, str(item))
for x in href:
print(x)
seller_pattern = '<a class="store j-p4plog".*?>(.*?)<'
comp = re.compile(seller_pattern)
href = re.findall(comp, str(item))
for x in href:
print(x)
orders_pattern = '<em title="Total Orders">.*?<'
comp = re.compile(orders_pattern)
href = re.findall(comp, str(item))
for x in href:
print(x[32:-1])
feedback_pattern = '<a class="rate-num j-p4plog".*?>(.*)<'
comp = re.compile(feedback_pattern)
href = re.findall(comp, str(item))
for x in href:
print(x)
# Creation and insertion of CSV file
# csvfile = "aliexpress.csv"
# csv = open(csvfile, "w")
# columnTitleRow = "Title,Price,Seller,Orders,Feedback,Pair"
# csv.write(columnTitleRow)
#
# for stuff in div:
# title =
# price =
# seller =
# orders =
# feedback =
# row = title + "," + price + "," + seller + "," + orders + "," + feedback +
"," + "\n"
# csv.write(row)
I want to be able to print these lists by their row. 我希望能够按它们的行打印这些列表。
It looks like the best way to conquer this is to create a list, and somehow insert the data into the list and write the data into the csv file.
似乎最好的解决方法是创建一个列表,然后以某种方式将数据插入列表中并将数据写入csv文件。 But how can I do that with my current setup?
但是如何使用当前设置进行操作?
Yes you're right. 你是对的。 Replace your print statements with
append
s to a list: 将您的print语句替换为
append
s到列表:
data = []
for item in div:
title_pattern = '<img alt="(.*?)\"'
comp = re.compile(title_pattern)
href = re.findall(comp, str(item))
for x in href:
data.append(x)
price_pattern = 'itemprop="price">(.*?)<'
comp = re.compile(price_pattern)
href = re.findall(comp, str(item))
for x in href:
data.append(x)
And then later 然后再
csv.writerow(data)
From what I remember, csv.write takes a list and not a rendered CSV string anyways. 从我记得的情况来看,csv.write始终使用一个列表,而不是呈现的CSV字符串。 That's the whole point, it takes the raw data and escapes it properly and adds the commas for you.
这就是重点,它获取原始数据并正确地对其进行转义并为您添加逗号。
Edit: As explained in the comment, I misremembered the interface to csv writer. 编辑:如评论中所述,我记错了csv writer的接口。
writerow
takes a list, not write
. writerow
需要一个列表,而不是write
。 Updated. 更新。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.