简体   繁体   English

如何仅在输入数据后从网站中提取表格数据?

[英]How to extract table data from a website only AFTER inputting data?

There is a website which doesn't take queries (hidden), there is an input field with an html id, once u enter value and click submit, you get a single row table.有一个不接受查询(隐藏)的网站,有一个带有 html id 的输入字段,一旦你输入值并单击提交,你就会得到一个单行表。

Is it possible to enter input values in a loop and get the table data by web scraping using python along with beautifulsoup or flask?是否可以在循环中输入输入值并通过使用 python 以及 beautifulsoup 或 Z31973206A7DZA99C3206A7DZA90 (Not selenium) (不是硒)

link关联

Click on Know your class & section单击了解您的 class 和部分

`import requests
import urllib.request
import time
from bs4 import BeautifulSoup

# Set the URL you want to webscrape from
url = 'https://www.pesuacademy.com/Academy'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
#results = soup.find(id = "knowClsSectionModalLoginId")
#R = soup.find(id = 'knowClsSectionModalTableDate')
try:
  a = soup.find('input', {'id':'knowClsSectionModalLoginId'}).get('value')
  for i in a:
    inputv = i.get('value')
    print(i, \n)

except:
  pass
`

I assume you are referring to "Know your Class & Section".我假设您指的是“了解您的 Class 和部分”。 This is a form.这是一种形式。 This is an ajax post call with the loginid .这是一个带有loginid的 ajax 后调用。

You can give all the ids in list loginids .您可以在 list loginids中提供所有 id。 The script loops through and gets all the data and saves to a csv file.该脚本循环并获取所有数据并保存到 csv 文件中。

import requests
from bs4 import BeautifulSoup
import pandas as pd

loginids = ["PES1201900004"]

payload = {
    "loginId": ""
}

headers = {
    "content-type": "application/x-www-form-urlencoded"
}
url = "https://pesuacademy.com/Academy/getStudentClassInfo"

columns = ['PRN', 'SRN', 'Name', 'Class', 'Section', 'Cycle', 'Department', 'Branch', 'Institute Name']

data = []

for logins in loginids:
    payload["loginId"] = logins

    res = requests.post(url, data=payload,headers=headers)
    soup = BeautifulSoup(res.text, "html.parser")
    data.append([i.get_text(strip=True) for i in soup.find("table").find("tbody").find_all("td")])

df = pd.DataFrame(data, columns=columns)
df.to_csv("data.csv", index=False)
print(df)

Output: Output:

             PRN SRN            Name Class Section Cycle Department  Branch Institute Name
0  PES1201900004  NA  AKSHAYA RAMESH                  NA             B ARCH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM