httpClient、jsoup、okhttp、junitHtml抓取页面方法介绍(未完成)

it2022-05-05  82

1.本文宗旨

记录日常所用并与网友分享

2.httpClient、jsoup、okhttp、junitHtml能干什么

用来发送或者接受http请求,或者说能够抓取网页爬去信息,就是java版的爬虫.

3.httpClient、okhttp、restTemplate收发请求

3.1httpClient的get请求

/** * httpClient的get请求介绍 */ @Test public void getHtmlPost(){ // 创建默认的httpClient实例. String url = "https://search.17k.com/search.xhtml"; Map<String,Object> map = new HashMap<>(); map.put("c.st",0); map.put("c.q","近战狂兵"); url = setDoGetUrl(url,map); RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(120000).setSocketTimeout(60000) .setConnectionRequestTimeout(60000).build(); CloseableHttpClient httpclient; HttpClientContext httpClientContext = HttpClientContext.create(); CookieStore cookieStore = null; cookieStore = new BasicCookieStore(); httpclient = HttpClientBuilder.create().setKeepAliveStrategy(new DefaultConnectionKeepAliveStrategy()) .setRedirectStrategy(new DefaultRedirectStrategy()).setDefaultRequestConfig(requestConfig) .setDefaultCookieStore(cookieStore).build(); Function<Object,Object> function=null; String resGet = doGet(url,map,httpclient,httpClientContext); } private String doGet(String url) { return doGet(url,null,null,null); } private String doGet(String url,CloseableHttpClient httpclient, HttpClientContext httpClientContext) { return doGet(url,null,httpclient,httpClientContext); } /* * get请求方法封装 */ private String doGet(String url, Map map, CloseableHttpClient httpclient, HttpClientContext httpClientContext) { if(httpclient==null) httpclient = HttpClients.createDefault(); if(httpClientContext==null)httpClientContext = HttpClientContext.create(); setDoGetUrl(url,map); CookieStore cookieStore223 = httpClientContext.getCookieStore(); if(cookieStore223!=null){ System.out.println("===============================什么这里有cookie===================="); List<Cookie> listCookie = cookieStore223.getCookies(); System.out.println(listCookie); } String web=""; HttpGet httpget = new HttpGet(url); CloseableHttpResponse response = null; httpget.setHeader("Connection", "close");//请求头设置 httpget.setHeader("Content-Encoding", "gzip"); httpget.setHeader("Server", "openresty"); httpget.setHeader("Transfer-Encoding", "chunked"); httpget.setHeader("Vary", "Accept-Encoding"); httpget.setHeader("Content-Type","text/html;charset=UTF-8"); try { //这一步很重要,如果只传一个参数就是普通的get请求,如果是带了httpClientContext上下文没那么可以将多次请求维持在同一session里面,前提是别关闭httpClient response = httpclient.execute(httpget,httpClientContext); HttpEntity entity = response.getEntity(); if (entity != null) { //获取请求头,根据请求头进行不同的处理 String contentType = entity.getContentType().getValue(); web = EntityUtils.toString(entity,"utf-8"); httpClientContext.getCookieStore().getCookies().forEach(System.out::println); } response.close(); httpclient.close(); }catch (Exception e) { e.printStackTrace(); } finally { //关闭连接,释放资源 try { response.close(); httpclient.close(); } catch (IOException e) { e.printStackTrace(); } } return web; } /* * get请求参数封装,手写 */ private static String setDoGetUrl(String url, Map<String,Object> map) { if(StringUtils.isBlank(url))return url; if(MapUtils.isEmpty(map))return url; if(!url.endsWith("?")) url += "?"; for (Map.Entry<String,Object> entry:map.entrySet()) { if(StringUtils.isBlank(entry.getKey()))continue; if(!url.endsWith("?")){ url += "&" + entry.getKey() + "=" + entry.getValue().toString(); }else{ url += entry.getKey() + "=" + entry.getValue().toString(); } } return url; }

3.2 restTemplate的get请求

restTemplate并没有重写底层的HTTP请求技术,而是提供配置,可选用OkHttp/HttpClient等,说白了就这对这两个封装。

这个就很简单了,


最新回复(0)