简体   繁体   English

如何从请求 python 的响应中检索文本

[英]How to retrieve text from response made from Requests python

I am trying to search inside the response of a request (I used Requests and Python).我正在尝试在请求的响应中进行搜索(我使用了请求和 Python)。 I get the response and check the type of it, which is UNICODE.我得到响应并检查它的类型,即 UNICODE。

I want to retrieve a specific link which is located between two other strings.我想检索位于其他两个字符串之间的特定链接。 I have tried different ways found online such as the:我尝试过在网上找到的不同方法,例如:

  • result = re.**search**('Currently: <a ', s)
  • url_file = response.**find**('Currently: <a ', beg=0, end=len(response))

Also tried to transform the UNICODE string to a normal string:还尝试将 UNICODE 字符串转换为普通字符串:

  • s = unicodedata.normalize(response, title).encode('ascii','ignore')

I get an error.我收到一个错误。

EDITED已编辑

For example:例如:

This works:这有效:

    s = 'asdf=5;iwantthis123jasd'
    result = re.search('asdf=5;(.*)123jasd', s)
    print result.group(1)

This doesn't work (returns error):这不起作用(返回错误):

    s = 'Currently: <a '
    result = re.search(r.text, s)
    print result.group(1)

You can access the raw text from the response object with the text attribute.您可以使用text属性从响应对象访问原始文本。

res = requests.get("http://google.com")
re.search('pattern', res.text)

Then, just use a regular expression to "search" or "match" the entire response.然后,只需使用正则表达式来“搜索”或“匹配”整个响应。

You are using re.search wrong.您正在使用re.search错误。 The first argument of the function is the pattern and the second one is the source string:该函数的第一个参数是模式,第二个参数是源字符串:

import re
import requests

s = '<a class=gb1 href=[^>]+>'
r = requests.get('https://www.google.com/?q=python')
result = re.search(s, r.text)

print result.group(0)

If you simply need the list of all matches you can use: re.findall(s, r.text)如果您只需要所有匹配项的列表,您可以使用: re.findall(s, r.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM