Tuesday, January 24, 2006

ASP.NET Screen Scraping

ASP.NET and the .NET framework make it unbelievably easy to retrieve web content (that’s it, whole web pages) from remote servers. You might have various reasons to retrieve remote web content, for example you might want to get the latest news headlines from popular news sites and link to them from your website.

To accomplish screen scraping in classic ASP, we had to resort to COM objects like AspHttp, ASPTear and Microsoft.XMLHTTP. The good news is that the .NET framework has built-in classes allowing getting remote web content with ease.

We are going to use 2 .NET classes found in the System.Net namespace - WebRequest and WebResponse, to get the remote web page content.

Here is how ASP.NET screen scraping works. We need to create an instance of the WebRequest class and request a web page through it. We can request either a static page (.htm, .html, .txt, etc.) or dynamic page (.asp, .aspx, .php, .pl, etc.). The type of the page we are requesting it’s not important, because we are getting what the page displays in the browser (usually HTML), not the actual page code.

After we have requested the page with our WebRequest object, we’ll have to use the WebResponse class in order to get the web page response returned by the WebRequest object.

Once we get the response into our WebResponse object, we use the System.IO.Stream (this class provides a generic view of a sequence of bytes) and System.IO.StreamReader classes to read the web page response as a text. The StreamReader class is designed to read characters from a byte stream in a particular encoding, while the Stream class is designed for byte input and output.

In our example below, we just print the response in the browser window with Response.Write, but you can parse this content and use only the parts that you need.

Here is a full working example of ASP.NET screen scraping, written in ASP.NET (VB.NET):

Sub Page_Load(Sender as Object, E as EventArgs)

Dim oRequest As WebRequest = WebRequest.Create("http://www.aspdev.org/asp.net/")
Dim oResponse As WebResponse = oRequest.GetResponse()

Dim oStream As Stream = oResponse.GetResponseStream()

Dim oStreamReader As New StreamReader(oStream, Encoding.UTF8)

Response.Write(oStreamReader.ReadToEnd())
oResponse.Close()
oStreamReader.Close()


End Sub

1 Comments:

Anonymous alat bantu sex said...

By reading this article many benefits that we can learn. thank you for sharing your insights to us all .

4:00 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home