Parse and scrape a web page

    Parse and scrape a web page

    Invoke-WebRequest:

    The Invoke-WebRequest cmdlet sends HTTP, HTTPS, FTP, and FILE requests to a web page or web service. It parses the response and returns collections of forms, links, images, and other significant HTML elements.

    This cmdlet was introduced in Windows PowerShell 3.0.

    Invoke-WebRequest show you formatted output of various properties of the corresponding web request. Like most cmdlets, Invoke-WebRequest returns an object.

    Example:

    PS C:\> $WebResponse = Invoke-WebRequest "http://winadmin.org"
    PS C:\> $WebResponse

    As the result is an object, we can see various object properties from the output.

    If you want to see only the content of web page, use $webResponse.Content

    Get links from a web page: $webResponse.Links.href

    GetImages from a web page: $webResponse.Images.src

    PS C:\> $webResponse.Images.src
    http://winadmin.org/wp-content/uploads/2014/10/logo.png
    http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
    http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow1.png
    http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow2.png
    http://winadmin.org/wp-content/uploads/2017/05/050117_1757_BasicsofPow3.png
    http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
    http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga1.png
    http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga2.png
    http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga3.png
    http://winadmin.org/wp-content/uploads/2017/03/031617_1825_Installinga4.png
    http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
    http://winadmin.org/wp-content/uploads/image/2017-03/Opening%20Powershell.png
    http://winadmin.org/wp-content/uploads/2017/03/031617_1737_ThePowerShe1.png
    http://winadmin.org/wp-content/uploads/image/2017-03/adding%20windows%20powershell%20ISE%20feature.GIF
    https://winadmin.org/wp-content/uploads/2017/05/Opening-Powershell_thumb.png
    https://winadmin.org/wp-content/uploads/2017/05/image_thumb.png
    https://winadmin.org/wp-content/uploads/2017/05/image_thumb-1.png
    http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
    http://winadmin.org/wp-content/uploads/2017/01/011117_0945_VPShellResd1.png
    http://0.gravatar.com/avatar/9c8fef70f5b6853e3152a593e8243d2d?s=20&d=mm&r=g
    http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup.png
    http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup1.png
    http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup2.png
    http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup3.png
    http://winadmin.org/wp-content/uploads/image/2016-01/disk%20cleanup4.png
    http://winadmin.org/wp-content/uploads/2016/01/013116_0637_DeleteWindo7.png
    PS C:\>

    Getting Forms from a webpage:

    PS C:\> $webResponse.Forms
     
    Id                            Method                        Action                        Fields
    --                            ------                        ------                        ------
    searchform                    get                           http://winadmin.org/          {[s, ]}

    We can even submit the web forms using PowerShell and can use to login.

     

     


    © 2020 WinAdmin.org, All Rights Reserved.

    We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.