简体   繁体   English

如何在python中从网页获取POST和GET参数

[英]How to get POST and GET parameters from Web Page in python

I want to get all GET and POST parameters from Web Page. 我想从网页获取所有GET和POST参数。 Let's say there is some web page. 假设有一些网页。 I can get all links from this page. 我可以从此页面获取所有链接。 But if this page takes input parameters (GET and POST) how can I get them? 但是,如果此页面接受输入参数(GET和POST),如何获得它们? My algorithm is like this: 我的算法是这样的:

find in web page this type of strings <form method="GET">...</form>;
then for each found result:
     get <input> fields and construct request
     then save it somewhere

My purpose is to write crawler which gets all links, GET and POST parameters from web site and then save it somewhere for further analysis. 我的目的是编写搜寻器,该爬行器从网站获取所有链接,GET和POST参数,然后将其保存在某个地方以进行进一步分析。 My algorithm is simple, so I want to know is there any other way (in python)? 我的算法很简单,所以我想知道还有其他方法吗(在python中)? Can you recommend any python libraries? 您可以推荐任何python库吗?

How about something like this to get you started? 这样的事情如何使您入门? It pulls out forms and input attributes: 它提取表单和输入属性:

from BeautifulSoup import BeautifulSoup

s = urllib2.urlopen('http://stackoverflow.com/questions/10614974/how-to-get-post-and-get-parameters-from-web-page-in-python').read()
soup = BeautifulSoup(s)

forms = soup.findall('form')
for form in forms:
  print 'form action: %s (%s)' % (form['action'], form['method'])
  inputs = form.findAll('input')
  for input in inputs:
    print "  -> %s" % (input.attrs) 

Output (for this page): 输出(此页面):

form action: /search (get)
  -> [(u'autocomplete', u'off'), (u'name', u'q'), (u'class', u'textbox'), (u'placeholder', u'search'), (u'tabindex', u'1'), (u'type', u'text'), (u'maxlength', u'140'), (u'size', u'28'), (u'value', u'')]
form action: /questions/10614974/answer/submit (post)
  -> [(u'id', u'fkey'), (u'name', u'fkey'), (u'type', u'hidden'), (u'value', u'923d3d8b45bbca57cbf0b126b2eb9342')]
  -> [(u'id', u'author'), (u'name', u'author'), (u'type', u'text')]
  -> [(u'id', u'display-name'), (u'name', u'display-name'), (u'type', u'text'), (u'size', u'30'), (u'maxlength', u'30'), (u'value', u''), (u'tabindex', u'105')]
  -> [(u'id', u'm-address'), (u'name', u'm-address'), (u'type', u'text'), (u'size', u'40'), (u'maxlength', u'100'), (u'value', u''), (u'tabindex', u'106')]
  -> [(u'id', u'home-page'), (u'name', u'home-page'), (u'type', u'text'), (u'size', u'40'), (u'maxlength', u'200'), (u'value', u''), (u'tabindex', u'107')]
  -> [(u'id', u'submit-button'), (u'type', u'submit'), (u'value', u'Post Your Answer'), (u'tabindex', u'110')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM