简体   繁体   中英

How can i escape quotes from a string?

I have this for example:

<a href="/Forums2008/forumPage.aspx?forumId=393" title="מזג האוויר">מזג האוויר</a>

What i want to parse is first the forumId=393 then only the 393 and the link and last the name in this case hebrew so it's a bit mess here the name should be:

מזג האוויר

I can use either indexof and substring or htmlagilitypack i prefer htmlagilitypack to get all three values maybe regex is also good way.

In the end i should get this four strings:

  1. forumId=393

  2. 393

  3. מזג האוויר

  4. /Forums2008/forumPage.aspx?forumId=393

What i tried so far and it's not even close to my goal is once with htmlagilitypack and the other with downloading the html save it as file and then parsing it i thought using indexof and substring but not sure how:

HtmlAgilityPack.HtmlDocument doc =
                        Qhw.Load("http://www.tapuz.co.il/forums/forumslistnew.asp");
parseIds(doc);

WebClient webclient = new WebClient();
webclient.DownloadFile("http://www.tapuz.co.il/forums/forumslistnew.asp",
                        @"c:\testhtml\mainforums.html");
webclient.Dispose();

string[] lines = File.ReadAllLines(@"c:\testhtml\mainforums.html");
foreach(string line in lines)
{
    if (line.Contains("href") && line.Contains("forumId=") && !wholeids.Contains(line))
    {
        string tg1 = "href="";
        wholeids.Add(line);
    }
}
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{   
    idsnumbers.Add(link.InnerText);
}

idsnumbers is List global var.

I would use HtmlAgilityPack , Uri.TryCreate and ParseQueryString :

string html = @"<a href=""/Forums2008/forumPage.aspx?forumId=393"" title=""מזג האוויר"">מזג האוויר</a>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var anchor = htmlDoc.DocumentNode.Descendants("a").FirstOrDefault();
if(anchor != null)
{
    string name = anchor.InnerText;
    string href = anchor.Attributes["href"].Value;
    Uri uri;
    if(Uri.TryCreate(href, UriKind.RelativeOrAbsolute, out uri))
    {
        var queryString = href.Substring(href.IndexOf('?')).Split('#')[0]; // because of relative uri
        var queryKeyValues = System.Web.HttpUtility.ParseQueryString(queryString);
        string forumId = queryKeyValues["forumId"];
    }
}

You could also create a fake absolute uri to avoid the string methods:

if(Uri.TryCreate(href, UriKind.RelativeOrAbsolute, out uri))
{
    if(!uri.IsAbsoluteUri)
        uri = new Uri(new Uri("http://www.google.com/"), uri);
    var queryKeyValues = System.Web.HttpUtility.ParseQueryString(uri.Query);
    string forumId = queryKeyValues["forumId"];
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM