Parse and scrape a web page

Parse and scrape a web page

Invoke-WebRequest:

The Invoke-WebRequest cmdlet sends HTTP, HTTPS, FTP, and FILE requests to a web page or web service. It parses the response and returns collections of forms, links, images, and other significant HTML elements.

This cmdlet was introduced in Windows PowerShell 3.0.

Invoke-WebRequest show you formatted output of various properties of the corresponding web request. Like most cmdlets, Invoke-WebRequest returns an object.

Example:

PS C:\> $WebResponse = Invoke-WebRequest "http://winadmin.org"
PS C:\> $WebResponse


As the result is an object, we can see various object properties from the output.

If you want to see only the content of web page, use $webResponse.Content

Get links from a web page: $webResponse.Links.href


GetImages from a web page: $webResponse.Images.src

PS C:\> $webResponse.Images.src
http://winadmin.org/wp-content/uploads/2014/10/logo.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow1.png
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow2.png
http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow3.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga1.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga2.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga3.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga4.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/image/2017-03/Opening%20Powershell.png
http://winadmin.org/wp-content/uploads/2017/03/031617_1737_ThePowerShe1.png
http://winadmin.org/wp-content/uploads/image/2017-03/adding%20windows%20powershell%20ISE%20feature.GIF
https://winadmin.org/wp-content/uploads/2017/05/Opening-Powershell_thumb.png
https://winadmin.org/wp-content/uploads/2017/05/image_thumb.png
https://winadmin.org/wp-content/uploads/2017/05/image_thumb-1.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/2017/01/011117_0945_VPShellResd1.png
http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup1.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup2.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup3.png
http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup4.png
http://winadmin.org/wp-content/uploads/2016/01/013116_0637_DeleteWindo7.png
PS C:\>

Getting Forms from a webpage:

PS C:\> $webResponse.Forms


Id Method Action Fields
-- ------ ------ ------
searchform get http://winadmin.org/ {[s, ]}

We can even submit the web forms using PowerShell and can use to login.


Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on reddit
Share on skype
Share on telegram
Share on whatsapp
Share on email
Share on print

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment