简体   繁体   中英

Python requests: logging into paytm to scrap the merchant data

I am trying to use the python requests library to login into the paytm and scrap the data of paytm merchants from its website by using the paytm nearby. But I am not able to do this, I have a lot of questions like do I need to login or the cookies do the job for me, do I need to put a get request before the post request.

Code I used-

payload = {"distance":10,"endLimit":20,"latitude":26.8467088,"longitude":80.9461592,
       "searchFilter":[{"filterType":"SERVICE","value":"PAYMENT_POINT"}],
       "sortBy":{"DISTANCE_WISE_SORT":"ASC"},"startLimit":20,"channel":"web","version":2}
pp = {"method":"get","channel":"web","version":2}
h={'Content-type': 'application/json', 'Accept': 'application/json, text/plain, */*',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
,'Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'en-US,en;q=0.9','Connection': 'keep-alive'
         #,'Content-Length': '222'
}

url = 'https://paytm.com/v1/api/getnearbysellers?child_site_id=1&site_id=1'

initial_url = 'https://paytm.com/nearby'

with requests.Session() as session:
    initial_response  = session.get(initial_url)

    response = session.post(url, headers=h, data=payload)


response.text

Response i got-

{"error":"invalid json","code":400}

The things I got by using the network monitoring tool in Chrome

General-

Request URL: https://paytm.com/v1/api/getnearbysellers?                                child_site_id=1&site_id=1
Request Method: POST
Status Code: 200 OK
Remote Address: 13.251.31.44:443
Referrer Policy: no-referrer-when-downgrade

Response Headers-
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Type: application/json;charset=UTF-8
Date: Mon, 03 Dec 2018 08:07:10 GMT
Server: openresty
set-cookie: JSESSIONID=66BC9F5E7F355200029AF2316C4E546B; Path=/; secure; HttpOnlyservice/; HttpOnly
strict-transport-security: max-age=31536000
Vary: Accept-Encoding
vary: Accept-Encoding
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-paytm-srv-id: pawslmktshopapp22
X-Powered-By: Express
x-server-time: 1543824430.635
x-xss-protection: 1; mode=block
Content-Length: 2606
Connection: keep-alive

Request Headers-
POST /v1/api/getnearbysellers?child_site_id=1&site_id=1 HTTP/1.1
Host: paytm.com
Connection: keep-alive
Content-Length: 222
Accept: application/json, text/plain, */*
Origin: https://paytm.com
X-XSRF-TOKEN: LZAiRpku-aptONqCm7Ellab6SEeDAnqZRvuM
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Content-type: application/json
Referer: https://paytm.com/nearby
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: referrer=; secure=true; returning_usr=1; _ga=GA1.2.938904971.1543763520; _gid=GA1.2.1467864151.1543763520; tvc_vid=11543763520802; referrer=; _vwo_uuid_v2=D4CB7C24D54981ADADF33AED06E04E853|54a0ddcfd4724496a7f9ce8ab7cded2a; _vis_opt_s=1%7C; _vis_opt_test_cookie=1; connect.sid=s%3A__KFlRc2nU_bImjL6OajuBL9cUygnrBi.h1hRCkhz0p5%2FAcoVmArbMXYlUtiv87HJnIueXnr3gSA; _gcl_au=1.1.1598196666.1543771833; _parsely_session={%22sid%22:1%2C%22surl%22:%22https://blog.paytm.com/10-things-you-didnt-know-you-could-do-on-paytm-62a1b200faa6?gi=a3bf13aea45%22%2C%22sref%22:%22https://www.google.co.in/%22%2C%22sts%22:1543823267357%2C%22slts%22:0}; _parsely_visitor={%22id%22:%22pid=a45f5e3ede4fdfe39b8544a91a36ad17%22%2C%22session_count%22:1%2C%22last_session_ts%22:1543823267357}; X-MW-TOKEN=0ac55a86-4b9e-4eaf-8188-71a8c4a705be; X-DM-TOKEN=7f417f58-b33f-4580-a6e3-348f8a324d5c; market-onboard.sid=s%3A8oCGyIrKg-0SsuWTyZ0e6UY3kf89fgnk.SvkZbJub6a%2BG9dZlKMf0rBP80VXFejnCkQiuEdCa27E; AWSELB=97B3358B1C150AC96AC74F39ED34D289809132006F1D0627F111BA7DAB6F4B4A64D171E96C5345A5110C0ECD2E0D82F0BD18BA748DF362808AF3F805565A609A67DC7BF11D; queenoftarts=pawslmktshopapp22; _gat_UA-36768858-14=1; XSRF-TOKEN=LZAiRpku-aptONqCm7Ellab6SEeDAnqZRvuM; _gat=1; _dc_gtm_UA-36768858-14=1; JSESSIONID=9C42AB42BC1AF0240183C02730E3754F


Query String Parameter-

child_site_id=1&site_id=1


Request Payload- 
 {"distance":10,"endLimit":20,"latitude":26.8467088,"longitude":80.9461592,"searchFilter":[{"filterType":"SERVICE","value":"PAYMENT_POINT"}],"sortBy":{"DISTANCE_WISE_SORT":"ASC"},"startLimit":20,"channel":"web","version":2}

Response-
{"requestGuid":null,"orderId":null,"status":"SUCCESS","statusCode":"SS_0001","statusMessage":"Request Successfully fullfilled.","response":[{"cashPointsDetail":{"terminalId":8507407,"terminalType":"User","businessName":"","contactPerson":["Vijay Kumar"],"address":["1","","Aasayana Aasayana"],"state":"Uttar Pradesh","city":"Lucknow","category":"Retail And Shopping","subCategory":"Books","location":{"lat":26.846227,"lon":80.946663},"contactNo":["8587932548"],"displayName":"The Book Service","fax":null,"startTime":"10:00","endTime":"20:00","saturdayStartTime":null,"saturdayEndTime":null,"rating":null,"monday":null,"tuesday":null,"wednesday":null,"thurday":null,"friday":null,"saturday":null,"sunday":null,"landMark":null,"pinCode":"226012","servicesOffered":["PAYMENT_POINT"],"terminalCode":"6283495","establishmentDate":null,"emailId":null,"logoUrl":"https://assetscdn.paytm.com/images/catalog/pg/Retail%20&%20Shopping.jpg","tagLine":null,"storeId":null,"merchantId":"6283495","address1":"1","address2":"","address3":"Aasayana Aasayana","editAble":true},"currentCashPointStatus":"open","isFavorite":false,"distanceFromLocation":0.07760059905591597,"offerText":null,"dealUrl":null},.........

I am not sure what I am doing wrong and how to proceed with this problem. Any assistance is appreciated!

Different website have different way to login, learning how to use fiddler to watch request structure is very useful. The code below is imitated from browser request. But i cann't make code work perfectly because i dont have an account(even register)

from requests import Session
from bs4 import BeautifulSoup
import re

url = 'https://paytm.com/v1/api/getnearbysellers?child_site_id=1&site_id=1'

initial_url = 'https://paytm.com/nearby'

auth_code_url = r"https://accounts.paytm.com/oauth2/authorize?theme=mp-web&redirect_uri=https%3A%2F%2Fpaytm.com%2Fv1%2Fapi%2Fauthresponse&is_verification_excluded=false&client_id=paytm-web-secure&type=web_server&scope=paytm&response_type=code"
login_ = "https://accounts.paytm.com/oauth2/authorize?client_id=paytm-web-secure&scope=paytm&response_type=code&redirect_uri=https://paytm.com/v1/api/authresponse&theme=mp-web&state=null&is_verification_excluded=false&isSignup=true"

with Session() as sess:
    #sess.verify = False
    sess.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/69.0.3494.0 Safari/537.36"
        }
    resp = sess.get(auth_code_url)
    auth_code = BeautifulSoup(resp.content,"lxml").find("div",{"id":"main-container"}).get("ng-init")
    auth_code = re.findall(r"(?<=authState.{4})(.*?)(?=\'|\")",auth_code)[0]
    print(auth_code)
    login_data = {
        "fakeusernameremembered":"",
        "fakepasswordremembered":"",
        "username":"xxxx", #your username
        "password":"xxxx", #your password
        "AUTH_STATE":auth_code

    }
    login = sess.post(login_,data = login_data)
    print(login.text)

    payload = {"distance":10,"endLimit":20,"latitude":26.8467088,"longitude":80.9461592,
       "searchFilter":[{"filterType":"SERVICE","value":"PAYMENT_POINT"}],
       "sortBy":{"DISTANCE_WISE_SORT":"ASC"},"startLimit":20,"channel":"web","version":2}

    resp = sess.get(initial_url)
    print(resp.cookies['XSRF-TOKEN'])
    sess.headers['X-XSRF-TOKEN'] = resp.cookies['XSRF-TOKEN']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM