Recommended hosting
Aug 29 2006

Screenscraping with HttpWebRequest

Posted by admin under ASP.NET articles

While I bet you all know how to (from your own webpage) read another webpage, even located on another site/server, so while I will show you the code for it, I also want to show you a very practical use of it.

Background

As you might have seen I have created a new layout for this site, ASPCode.net. I am using the beta version of KBMentor but that's not what I want to talk about here. However, for reasons I won't get into, I wanted to create the site in directory aspcode.net/articles/ and therefore I naturally created a vdir (own application) in the articles drectory and thereaftter installed my KBMentor CMS.

Problem

One of the functions of KBMentor is to produce a Google SiteMap out of all the articles and pages. So I wanted to test it out - but ran into a big problem. The page to submit to Google SItemap is http://www.aspcode.net/articles/googl...map.ashx - but it wouldn't let me - simply because it is not located in the root of my domain. Moving the ashx page to the root wouldn't help at all, simply because another application is installed in that location.

Solution

Now to the solution. I did add a new ashx page in the root domain - and what is does is simply "screenscrape" the ashx file in the articles directory.



 
		public void ProcessRequest (HttpContext context)
		{   
			//Screenscrape
			CookieContainer CC = new CookieContainer();

			HttpWebRequest Req = (HttpWebRequest) WebRequest.Create("http://www.aspcode.net/articles/googlesitemap.ashx");
			Req.CookieContainer = CC; 

			WebResponse webResponse = Req.GetResponse();
			string sTxt = new System.IO.StreamReader(webResponse.GetResponseStream(),
				System.Text.Encoding.Default).ReadToEnd();
			webResponse.Close();



			context.Response.ContentType = "text/xml";
			context.Response.ContentEncoding = System.Text.Encoding.UTF8;
			context.Response.Write( sTxt);

			//
			// TODO: Add constructor logic here
			//
		}

ASPCode.net recommends