简体   繁体   English

python请求对url进行编码

[英]python requests encode the url

I am trying to search in google via parameters, it's working when I search one word, but one I do space its broken I know there is a way to encode the url.我正在尝试通过参数在 google 中搜索,当我搜索一个词时它正在工作,但是我做了一个空格,我知道有一种方法可以对 url 进行编码。

import urllib.request
from urllib.parse import urlencode, quote_plus
from fake_useragent import UserAgent
import time
import requests
from bs4 import BeautifulSoup

keyword = "host free"
url = "https://www.google.co.il/search?q=%s" % (keyword)
print(url)

thepage = urllib.request.Request(url, headers=request_headers)
page = urllib.request.urlopen(thepage)

//Continue...

Traceback:追溯:

https://www.google.co.il/search?q=host free
Traceback (most recent call last):
  File "C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google\Google_Bot_new.py", line 42, in <module>
    page = urllib.request.urlopen(thepage)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
[Finished in 0.7s with exit code 1]
[shell_cmd: python -u "C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google\Google_Bot_new.py"]
[dir: C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google]
[path: C:\Program Files (x86)\Python37-32\Scripts\;C:\Program Files (x86)\Python37-32\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;D:\Program Files\Git\cmd;C:\Users\Maor Ben Lulu\AppData\Local\Microsoft\WindowsApps;]

Also once I write in hebrew its saying :还有一次我用希伯来语写下它的说法:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 14-18: ordinal not in range(128) UnicodeEncodeError: 'ascii' 编解码器无法对位置 14-18 中的字符进行编码:序号不在范围内 (128)

There is a way to encode url with urllib.parse.quote but there is requests module which is very helpful in all such case and you can use it as below:有一种方法可以使用urllib.parse.quote对 url 进行编码,但是请求模块在所有此类情况下都非常有用,您可以按如下方式使用它:

import requests
base_url = 'https://www.google.co.il/search'
res = requests.get(base_url, params={'q': 'host free'})  # query parameter and value in dict format to be passed as params kwarg

As you can see above you can pass query parameters as keyword argument正如您在上面看到的,您可以将查询参数作为关键字参数传递

Requests library can do it for you as Gahan mentioned.正如Gahan提到的那样, Requests库可以为您完成。 Pass query params and headers via dictionary to request.get() :通过字典将查询paramsheaders传递给request.get()

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
     # other headers (if needed)
}

params = {
  'q': 'how to create minecraft server',   # query 
  'gl': 'us',                              # country to search from (United States in this case)
  'hl': 'en'                               # language
   # other params (if needed)
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')

Code and example in the online IDE : 在线IDE中的代码和示例

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}

params = {
  'q': 'how to create minecraft server',
  'gl': 'us',
  'hl': 'en',
}

html = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(html, 'lxml')

for result in soup.select('.tF2Cxc'):
  title = result.select_one('.yuRUbf').text
  link = result.select_one('.yuRUbf a')['href']
  print(title, link, sep='\n')

---------
'''
How to Setup a Minecraft: Java Edition Server – Home
https://help.minecraft.net/hc/en-us/articles/360058525452-How-to-Setup-a-Minecraft-Java-Edition-Server
Minecraft Server Download
https://www.minecraft.net/en-us/download/server
Setting Up Your Own Minecraft Server - iD Tech
https://www.idtech.com/blog/creating-minecraft-server
Tutorials/Setting up a server - Minecraft Wiki
https://minecraft.fandom.com/wiki/Tutorials/Setting_up_a_server
# other results
'''

Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi.或者,您可以使用 SerpApi 的Google Organic Results API 来实现相同的目的。 It's a paid API with a free plan.这是一个带有免费计划的付费 API。

The difference in your case is that you don't have to spend time figuring out such things or how to bypass blocks from Google if the problem is not only to pass user-agent in requests headers.您的情况的不同之处在于,如果问题不仅仅是在请求标头中传递用户代理,则您不必花时间弄清楚这些事情或如何绕过 Google 的块。

Instead, you need to iterate over structured JSON with desired parameters ( params ) and get the data you want.相反,您需要使用所需参数 ( params ) 迭代结构化 JSON 并获取所需数据。

Example code to integrate:要集成的示例代码:

import os
from serpapi import GoogleSearch

params = {
  "engine": "google",
  "q": "tesla",
  "hl": "en",
  "gl": "us",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

# scrapes first page of Google results
for result in results["organic_results"]:
  print(result['title'])
  print(result['link'])


---------
'''
How to Setup a Minecraft: Java Edition Server – Home
https://help.minecraft.net/hc/en-us/articles/360058525452-How-to-Setup-a-Minecraft-Java-Edition-Server
Minecraft Server Download
https://www.minecraft.net/en-us/download/server
Setting Up Your Own Minecraft Server - iD Tech
https://www.idtech.com/blog/creating-minecraft-server
Tutorials/Setting up a server - Minecraft Wiki
https://minecraft.fandom.com/wiki/Tutorials/Setting_up_a_server
# other results
'''

Disclaimer, I work for SerpApi.免责声明,我为 SerpApi 工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM