Parse and scrape a web page

Parse and scrape a web page

Invoke-WebRequest:

The Invoke-WebRequest cmdlet sends HTTP, HTTPS, FTP, and FILE requests to a web page or web service. It parses the response and returns collections of forms, links, images, and other significant HTML elements.

This cmdlet was introduced in Windows PowerShell 3.0.

Invoke-WebRequest show you formatted output of various properties of the corresponding web request. Like most cmdlets, Invoke-WebRequest returns an object.

Example:

PS C:\> $WebResponse = Invoke-WebRequest "http://winadmin.org"
PS C:\> $WebResponse

Parseandscr1

As the result is an object, we can see various object properties from the output.

If you want to see only the content of web page, use $webResponse.Content

Get links from a web page: $webResponse.Links.href

080920 1557 Parseandscr2

GetImages from a web page: $webResponse.Images.src

PS C:\> $webResponse.Images.src
http://winadmin.org/wp-content/uploads/2014/10/logo.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow1.png
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow2.png
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow3.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga1.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga2.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga3.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga4.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/image/2017-03/Opening%20Powershell.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1737_ThePowerShe1.png
http://winadmin.org/wp-content/uploads/image/2017-03/adding%20windows%20powershell%20ISE%20feature.GIF
https://winadmin.org/wp-content/uploads/2017/05/Opening-Powershell_thumb.png
https://winadmin.org/wp-content/uploads/2017/05/image_thumb.png
https://winadmin.org/wp-content/uploads/2017/05/image_thumb-1.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/01/011117_0945_VPShellResd1.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup1.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup2.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup3.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup4.png
http://winadmin.org/wp-content/uploads/2016/01/013116_0637_DeleteWindo7.png
PS C:\>

Getting Forms from a webpage:

PS C:\> $webResponse.Forms


Id Method Action Fields
-- ------ ------ ------
searchform get http://winadmin.org/ {[s, ]}

We can even submit the web forms using PowerShell and can use to login.