Parse and scrape a web page


The Invoke-WebRequest cmdlet sends HTTP, HTTPS, FTP, and FILE requests to a web page or web service. It parses the response and returns collections of forms, links, images, and other significant HTML elements.

This cmdlet was introduced in Windows PowerShell 3.0.

Invoke-WebRequest show you formatted output of various properties of the corresponding web request. Like most cmdlets, Invoke-WebRequest returns an object.


PS C:\> $WebResponse = Invoke-WebRequest ""
PS C:\> $WebResponse

As the result is an object, we can see various object properties from the output.

If you want to see only the content of web page, use $webResponse.Content

Get links from a web page: $webResponse.Links.href

GetImages from a web page: $webResponse.Images.src

PS C:\> $webResponse.Images.src
PS C:\>

Getting Forms from a webpage:

PS C:\> $webResponse.Forms

Id Method Action Fields
-- ------ ------ ------
searchform get {[s, ]}

We can even submit the web forms using PowerShell and can use to login.

