I want to create a PowerShell script to get infromation from a website. I am trying to find the first occurence of the following HTML tag of the website:
<div class="dDoNo gsrt"><span data-dobid="hdw">Text I want to find</span></div>
I am using the following PowerShell code without success, gives me not output:
$WebResponse = Invoke-WebRequest "https://www.google.co.in/search?hl=en&q=define+Text"
($WebResponse.ParsedHtml.GetElementsByTagName(‘div’) | Where {
$_.ClassName -eq ‘dDoNo’
}).InnerText
To be more precise: I am trying to get the definition of a word by scraping the HTML from google and am using this class as a base: googleDictionaryAPI class
For one thing, you need to call GetElementsByTagName()
on the DocumentElement
child node of ParsedHtml
, otherwise you don't get any results at all. Also, the class string "dDoNo gsrt" does not equal "dDoNo", so you need to test if the value contains the class name "dDoNo".
Change
($WebResponse.ParsedHtml.GetElementsByTagName(‘div’) | Where {
$_.ClassName -eq ‘dDoNo’
}).InnerText
to
($WebResponse.ParsedHtml.DocumentElement.GetElementsByTagName('div') | Where {
$_.ClassName -match '\bdDoNo\b'
}).InnerText
and the code should do what you want.
Note that using typographic quotes ( '
) in code is not recommended. While they work most of the time I did encounter situations where they caused things to break in interesting ways. Use plain quotes instead ( '
).
Thanks to @Ansgar to pointing me to the correct solution.
The main problem was that the response I got from Invoke-WebRequest was different than the one i got from a browser. The solution was to define a UserAgent when invoking the request:
$WebResponse = (Invoke-WebRequest -Uri "https://www.google.co.in/search?hl=en&q=define+Text" -UserAgent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36")
($WebResponse.ParsedHtml.DocumentElement.GetElementsByTagName('div') | Where {
$_.ClassName -match '\bdDoNo\b'
}).InnerText
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.