简体   繁体   中英

Parse javascript value from HTML response in C#

I'm actually building an application where by using an HttpClient in .NET 4.5, I send a GET request to a webpage (which isn't mine) and I receive this response in the Content:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:widget="http://www.netvibes.com/ns/">
<head>  
<meta http-equiv="content-type" 
      content="text/html;charset=utf-8" />
<script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script><title>Site</title>

<script type="text/javascript">
var HOST_DOMAIN = 'http://www.site.com/';
var ID = '2261443944';
var BASE_URL = 'https://base.site.com';
</script>

  </head>
    <body >
    </body>
</html>

What I would like to do is somehow Parse the values HOST_DOMAIN, ID and BASE_URL from the js script in the head section using .NET 4.5 libraries, but I can't find how. Any ideas?

Use a regular expression that captures the url in a group.

  string pattern = @"var HOST_DOMAIN = '([^']+)';";
  Match match = Regex.Matches(html, pattern)).FirstOrDefault();
  if (match != null)
      return match.Groups[1].Value;

Explanation: The parenthesis define a group in the regular expression that gets stored in the Groups properties of the match.

Possible problem: does not work if the url contains escaped apostrophs.

For regex for host name use

var ValidHostnameRegex = "^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$";

Refer link : Stack Overflow Hostname Regex

Then for a number use Reg Ex Number

and how to use reg ex in javascript How to use regex in js

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM