简体   繁体   中英

How to read other website data using angular website with backend asp.net c#

I am trying to read other website data using angular website with backend asp.net c#, other website has following login authentication after login need to read the data.

I have following authentical pages

  1. first page i have to enter username

https://i.stack.imgur.com/PgbLo.png

  1. second page i have to enter password then click login

https://i.stack.imgur.com/xAr4E.png

  1. after login it will redirect to home page which content information of person details

https://i.stack.imgur.com/2I4UX.png

I need to display third point page information Name, mobile, and address in my angular website.

Third page url will be looks like: www.xyz.com/1

the above 1 digit in the url is id of information base on id information of third page will appear

I found some code using c# but how to manage logins

System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.yoursite.com/resource/file.htm");

string webData = System.Text.Encoding.UTF8.GetString(raw);

and other way

  System.Net.WebClient wc = new System.Net.WebClient();
   string webData = 
   wc.DownloadString("http://www.yoursite.com/resource/file.htm");
   System.Net.WebClient wc = new System.Net.WebClient();

I would suggest checking if the site you want to get the data from has a public API (it's easier to call, and there is a chance the site has a "No bots" policy, your IP could get banned)

That being said, if you want to automatically log in the site and then get the data, you'll need to use something like Selenium (not an expert on it, but I can give you a link with documentation)

https://www.javatpoint.com/selenium-csharp

Selenium is usually used for automation testing, but it will allow you to simulate user interaction with a site (finding the field input and typing values).

Nos for getting the information, I used HtmlAgilityPack in the past, it is a C# library, that allows you to look for specific elements in a site (using something called XPath, it's easy to learn).

https://html-agility-pack.net/

They have some great documentation, and after you get the results from the site you can just expose it using an API that you can consume from you angular app.

Like Facundo Gallardo said the best way to get data from a website is using the site's API if it has one. If not there isn't much you can do unfortunately.

Using a web browser simulator like selenium could be an option if you were building a desktop application, but for a server it is way too resource intensive. If you were to use it (and I am not even sure you can in a server environment), your server would have to open a new browser like chrome for every user and every time you wanted to scrap data from the other website.

Any other solution that doesn't require you to open a full browser can only scrap static data that require no authentication to be seen

You can use python script to do so. Prepare a python script to automate the login and use the same script in c# to complete the process.

Here are the two links for reference that may be helpfull.

submitting a webform using python

running a python script in c#

For automation purpose with some Webui you will want to use something like https://www.puppeteersharp.com/ .

Most things that you can do manually in the browser can be done using Puppeteer!

Given something like a login screen like

<!DOCTYPE html>
<html>
<body>
<form action="/action_page.php">
  <label for="name">name:</label><br>
  <input type="text" id="name" name="name"><br>
  <label for="password">password:</label><br>
  <input type="password" id="password" name="password"><br><br>
  <input type="submit" value="Submit">
</form>
</body>
</html>

The .net code for the login challenge would look like

using PuppeteerSharp;

//Setup puppeteer things
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = false });
await using var page = await browser.NewPageAsync();

//Using https://jsfiddle.net/hwu6bkj0/ as demo web form
await page.GoToAsync("https://jsfiddle.net/hwu6bkj0/");

//Wait for your input element to appear/get loaded. Use the F12/Dev tools of your browser to find out the right query
await Task.Delay(1000);

// JsFiddle specific - getting result iframe
var frameElement = await page.QuerySelectorAsync("iframe[name='result']");
var frame = await frameElement.ContentFrameAsync();
var frameContent = await frame.GetContentAsync();

// Setting values to the input elements
var usenanme = await frame.QuerySelectorAsync("input[name='name']");
await usenanme.TypeAsync("testUserName");

var passsword = await frame.QuerySelectorAsync("input[name='password']");
await passsword.TypeAsync("pa$$word");

var submitButton = await frame.QuerySelectorAsync("input[type='submit']");
await Task.Delay(1000);
await submitButton.ClickAsync();

await Task.Delay(5000);

Headless will be set to true and the "Task.Delay"s should be removed in a final version.

Every following form should work similar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM