简体   繁体   English

从第二次开始,用户在使用 Flask 构建的网站上提交表单后,无法成功执行 python 网页抓取脚本

[英]Unable to execute python web scraping script successfully after user submits a form on a website built with Flask from the second time onwards

Using Flask and Python, I have a website running on localhost which allows user to select a specific month to download a report for.使用 Flask 和 Python,我有一个在 localhost 上运行的网站,它允许用户选择特定月份来下载报告。 Based on the selected month, I will than have my web scraping file imported which retrieves the data from another website (requires login).根据选定的月份,我将导入我的网络抓取文件,该文件从另一个网站检索数据(需要登录)。 My web scraping script uses Mechanize.我的网页抓取脚本使用 Mechanize。

Here is the portion of code where my web scraping file (webscrape.py) is imported after the download button is clicked (the selection is done on office.html):这是单击下载按钮后导入我的网络抓取文件 (webscrape.py) 的代码部分(选择在 office.html 上完成):

@app.route('/office/', methods=['GET','POST'])
def office():
    form=reportDownload()
    if request.method=='POST':
        import webscrape
        return render_template('office.html', success=True)
    elif request.method=='GET':
        return render_template('office.html', form=form)

In the render_template method,success=True is passed as an argument so that my office.html script will display a success message, if not (when it is a GET request), it will display the form for user selection.在 render_template 方法中,success=True 作为参数传递,以便我的 office.html 脚本显示成功消息,如果没有(当它是 GET 请求时),它将显示用户选择的表单。 Here is my script for office.html:这是我的 office.html 脚本:

{% extends "layout.html" %}
{% block content %}
  <h2>Office</h2>
  {% if success %}
    <p>Report was downloaded successfully!</p>
  {% else %}
    <form action="{{ url_for('office') }}" method="POST">
      <table width="70%" align="center" cellpadding="20">
      <tr>
        <td align="right"><p>Download report for: </p></td>
        <td align="center"><p>Location</p>
                  {{form.location}}</td>
        <td align="center"><p>Month</p> 
                             {{form.month}}  </td>
        <td align="center"><p>Year</p>   
                             {{form.year}}  </td>
      </tr>
      <tr>
        <td></td>
        <td></td>
        <td></td>
        <td align="center">{{form.submit}} </td>
      </tr>
    </table>
   </form>
   {% endif %}
{% endblock %}

The problem I have is when I want to do further downloads, ie after downloading for the first time, I go back to the office page and download a report again.我遇到的问题是当我想进一步下载时,即第一次下载后,我回到办公室页面并再次下载报告。 On the second try, the success message gets displayed but nothing gets downloaded.第二次尝试时,将显示成功消息,但没有下载任何内容。

In my web scraping script, using mechanize and cookiejar, I have this few lines of code in the beginning:在我的网页抓取脚本中,使用 mechanize 和 cookiejar,一开始我有这几行代码:

  br = mechanize.Browser()
  cj = cookielib.LWPCookieJar()
  br.set_cookiejar(cj)

and I proceed with the web scraping.然后我继续进行网络抓取。

When running the web scraping file on my Terminal (or command prompt), the script executes without any problems, even if I run it a second or third time.在我的终端(或命令提示符)上运行网络抓取文件时,即使我第二次或第三次运行该脚本,该脚本也可以毫无问题地执行。 So I think that it may be a problem with the website codes.所以我认为这可能是网站代码的问题。

Any suggestions will be appreciated!任何建议将不胜感激! I have tried different ways of resolving the problem such as using return redirect instead, or trying to clear the cookies in cookiejar.我尝试了解决问题的不同方法,例如改用返回重定向,或尝试清除 cookiejar 中的 cookie。 None has worked so far, or I may be using the methods wrongly.到目前为止都没有工作,或者我可能错误地使用了这些方法。

Thank you in advance!先感谢您!

Once your Flask app is started it only imports each package once.一旦您的 Flask 应用程序启动,它只会导入每个包一次。 That means that when it runs into import webscrape for the second time it says “well, I already imported that earlier, so no need to take further action…” and moves on to the next line, rendering the template without actually starting the script.这意味着当它第二次运行import webscrape时,它会说“好吧,我之前已经导入了它,所以不需要采取进一步的行动……”然后移动到下一行,在不实际启动脚本的情况下渲染模板。

In that sense import in Python is not the same as require for other languages (such as PHP; by the way, it would be closer to require_once in PHP).从这个意义上说,Python 中的import与其他语言的require不同(例如 PHP;顺便说一下,它更接近 PHP 中的require_once )。

The solution would be to make your scraper an object ( class ) and instantiate it each time you need it.解决方案是使您的刮刀成为一个对象( class )并在每次需要时实例化它。 Then you move the import to the top of the file and inside the if request.method=='POST' you just create a new instance of your web scraper.然后将导入移动到文件的顶部,并在if request.method=='POST'创建一个新的网络爬虫实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM