简体   繁体   English

Python:将数据剪贴到Excel电子表格时没有回溯

[英]Python: No Traceback when Scraping Data into Excel Spreadsheet

I'm an inexperienced coder working in python. 我是在python中工作的经验不足的编码器。 I wrote a script to automate a process where certain information would be ripped from a webpage and then copied, where it would be pasted into a new excel spreadsheet. 我编写了一个脚本来自动执行一个过程,在该过程中,某些信息将从网页中撕下然后复制,然后将其粘贴到新的Excel电子表格中。 I've written and executed the code, but the excel spreadsheet I've designated to receive the data is completely empty. 我已经编写并执行了代码,但是我指定用于接收数据的Excel电子表格完全为空。 Worst of all, there is no traceback error. 最糟糕的是,没有回溯错误。 Would you help me find the problem in my code? 您能帮我在代码中找到问题吗? And how do you generally solve your own problems when not provided a traceback error? 当没有提供回溯错误时,您通常如何解决自己的问题?

import xlsxwriter, urllib.request, string


def main():

    #gets the URL for the expert page
open_sesame = urllib.request.urlopen('https://aries.case.com.pl/main_odczyt.php?strona=eksperci')
    #reads the expert page
readpage = open_sesame.read()
    #opens up a new file in excel
workbook = xlsxwriter.Workbook('expert_book.xlsx')
    #adds worksheet to file
worksheet = workbook.add_worksheet()

    #initializing the variable used to move names and dates
    #in the excel spreadsheet
boxcoA = ""
boxcoB = ""
    #initializing expert attribute variables and lists
expert_name = ""
url_ticker = 0
name_ticker = 0
raw_list = []
url_list = []
name_list= []
date_list= []
    #this loop goes through and finds all the lines
    #that contain the expert URL and name and saves them to raw_list::
    #raw_list loop
for i in readpage:
    i = str(i)
    if i.startswith('<tr><td align=left><a href='):
        raw_list += i

    #this loop goes through the lines in raw list and extracts
    #the name of the expert, saving it to a list::
    #name_list loop
for n in raw_list:
    name_snip = n.split('target=_blank>','</a></td><')[1]
    name_list += name_snip
    #this loop fills a list with the dates the profiles were last updated::
    #date_list
for p in raw_list:
        url_snipoff = p[28:]
        url_snip = url_snipoff.split('"')[0]
        url_list += url_snip
        expert_url = 'https://aries.case.com.pl/'+url_list[url_ticker]
        open_expert = urllib2.openurl(expert_url)
        read_expert = open_expert.read()
        for i in read_expert:
            if i.startswith('<p align=left><small>Last update:'):
                update = i.split('Last update:','</small>')[1]
        open_expert.close()
        date_list += update

    #now that we have a list of expert names and a list of profile update dates
    #we can work on populating the excel spreadsheet


    #this operation will iterate just as long as the list is long
    #meaning that it will populate the excel spreadsheet
    #with all of our names and dates that we wanted
for z in raw_list:
    boxcoA = string('A',z)
    boxcoB = string('B',z)
    worksheet.write(boxcoA, name_list[z])
    worksheet.write(boxcoB, date_list[z])
workbook.close()
print('Operation Complete')


main()

The lack of a traceback only means your code raises no exceptions. 缺少回溯仅意味着您的代码不会引发异常。 It does not mean your code is logically correct. 这并不意味着您的代码在逻辑上是正确的。

I would look for logic errors by adding print statements, or using a debugger such as pdb or pudb . 我会通过添加打印语句或使用调试器(例如pdbpudb)来查找逻辑错误。

One problem I notice with your code is that the first loop seems to presume that i is a line, whereas it is actually a character. 我在您的代码中注意到的一个问题是,第一个循环似乎假定i是一行,而实际上是一个字符。 You might find splitlines() more useful 您可能会发现splitlines()更有用

If there is no traceback then there is no error. 如果没有回溯,则没有错误。

Most likely something has gone wrong with your scraping/parsing code and your raw_list or other arrays aren't populated. 您的抓取/解析代码很可能出了点问题,并且未填充raw_list或其他数组。

Try print out the data that should be written to the worksheet in the last loop to see if there is any data to be written. 尝试打印出应在最后一个循环中写入工作表的数据,以查看是否有任何数据要写入。

If you aren't writing data to the worksheet then it will be empty. 如果您没有将数据写入工作表,那么它将为空。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM