解决HttpWebRequest和HtmlAgilityPack采集网页中文乱码问题

it2022-05-05  87

找了好多都不好用,这个好用了,谢谢分享的博主,下面给大家看看吧

文章来给大家介绍解决HttpWebRequest和HtmlAgilityPack抓取网页中文乱码问题,如果你在使用此HttpWebRequest和HtmlAgilityPack获取远程网页面时出现乱码我们可进入参考一下哦。

代码如下

 代码如下复制代码public Encoding GetEncoding(string CharacterSet){switch (CharacterSet){case "gb2312": return Encoding.GetEncoding("gb2312");case "utf-8": return Encoding.UTF8;default: return Encoding.Default;}}public string HttpGet(string url){string responsestr = "";HttpWebRequest req = HttpWebRequest.Create(url) as HttpWebRequest;req.Accept = "*/*";req.Method = "GET";req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1";using (HttpWebResponse response = req.GetResponse() as HttpWebResponse){Stream stream;if (response.ContentEncoding.ToLower().Contains("gzip")){stream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress);}else if (response.ContentEncoding.ToLower().Contains("deflate")){stream = new DeflateStream(response.GetResponseStream(), CompressionMode.Decompress);}else{stream = response.GetResponseStream();}using (StreamReader reader = new StreamReader(stream, GetEncoding(response.CharacterSet))){responsestr = reader.ReadToEnd();stream.Dispose();}}return responsestr;}

调用HttpGet就可以获取网址的源码了,得到源码后, 现在用一个利器HtmlAgility来解析html了,不会正则不要紧,此乃神器啊。老板再也不用担心我的正则表达式了。

至于这个神器的用法,园子文章很多,写的也都挺详细的,在此不赘余了。

 下面是抓取园子首页的文章列表:

 代码如下复制代码

string html = HttpGet("http://www.111cn.net/");HtmlDocument doc = new HtmlDocument();doc.LoadHtml(html);//获取文章列表var artlist = doc.DocumentNode.SelectNodes("//div[@class='post_item']");foreach (var item in artlist){HtmlDocument adoc = new HtmlDocument();adoc.LoadHtml(item.InnerHtml);var html_a = adoc.DocumentNode.SelectSingleNode("//a[@class='titlelnk']");htm = htm&(string.Format("标题为:{0},链接为:{1}<br>",html_a.InnerText,html_a.Attributes["href"].Value));}

好了运行有中文的网页是没有问题的哦,取出来的截图我就不介绍了,当然大家可进行一些调整处理,本文章主要是介绍解决中文乱码问题。

 

原地址是http://www.111cn.net/net/net/63856.htm

 

转载于:https://www.cnblogs.com/waiwai1015/p/4749566.html

相关资源:各显卡算力对照表!

最新回复(0)