前言
Httpclient 3.x和Httpclient 4.x如何获取服务端返回的cookie和缓存等数据? 随着网站安全意识的提高,很多网站都会把部分重要信息,如加密的key,重要的网页数据,需要跳转的链接等数据放到服务器端返回的cookie或者session中,那么Java爬虫,Java HttpClient把这部分内容拿出来做处理,是Java HttpClient在实际使用中必不可少必须掌握的技能,下面会对如何获取服务器端返回的cookie和session做简单分析,请详细查看:
String cookiessessionStr = "" ; Cookie[] cookiesssession = httpClient.getState().getCookies(); for (Cookie c : cookiesssession ) { cookiessessionStr += c.toString() + ";" ; } System.out.println("cookiessession:" + cookiessessionStr );
Httpclient 4.X如何获取服务端返回的cookie和缓存等数据
List<Cookie> cookiesget = httpClient.getCookieStore().getCookies(); if (cookiesget.isEmpty()) { //do nothing } else { for (int i = 0; i < cookiesget.size(); i++) { Cookie co = cookiesget.get(i) ; String newcooikes = co.getName() + "=" + co.getValue() + "; " ; cookies += newcooikes ; System.out.println(newcooikes); } }
分享一个新浪weibo登录时候请求的例子,代码太长,只贴了部分
String securl = "https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)" ; g3 = new HttpPost(securl) ; g3.setHeader("Accept", "application/json, text/javascript, */*; q=0.01") ; g3.setHeader("X-Requested-With", "XMLHttpRequest") ; g3.setHeader("Accept-Language", "zh-CN") ; g3.setHeader("Pragma", "no-cache") ; g3.setHeader("Connection", "Keep-Alive") ; g3.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko") ; g3.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8") ; g3.setHeader("Cookie", cookies) ; String content = "entry=weibo&gateway=1&from=&savestate=7&qrcode_flag=false&useticket=1" + "&pagerefer=https%3A%2F%2Fpassport.weibo.com%2Fvisitor%2Fvisitor%3F entry%3Dminiblog%26a%3Denter%26url%3Dhttps%253A%252F%252Fweibo.com%252F%26domain%3D. weibo.com%26ua%3Dphp-sso_sdk_client-0.6.23%26_rand%3D1509026650.1567" + "&pcid=" + pcid + "&door=" + captcha + "&vsnf=1" + "&su=" + URLEncoder.encode(su) + "&service=miniblog" + "&servertime=" + servertime + "&nonce=" + nonce + "&pwencode=rsa2" + "&rsakv=" + rsakv + "&sp=" +sp + "&sr=1920*1080&encoding=UTF-8&prelt=200&url= https%3A%2F%2Fweibo.com%2Fajaxlogin.php%3Fframelogin%3D1%26callback%3D parent.sinaSSOController.feedBackUrlCallBack&returntype=META" ; StringEntity reqEntity = new StringEntity(content); g3.setEntity(reqEntity) ; response2 = httpClient.execute(g3); //httpClient.executeMethod(g1) ; sg1 = EntityUtils.toString(response2.getEntity(),"GBK") ; sg1 = StringRandomUtils.unicodeToString(sg1) ; if(response2.getStatusLine().getStatusCode()==200 && sg1.indexOf("https://login.sina.com.cn/crossdomain2.php")!=-1 ) { newCookies2 = StringRandomUtils.handleCookies(httpClient); cookies += newCookies2 ; String crossdomainurl = "" ; sg1 = sg1.substring(sg1.indexOf("https://login.sina.com.cn/crossdomain2.php")) ; crossdomainurl = sg1.substring(0, sg1.indexOf("\"")) ; }
注明: 代码中的网站均为例子,仅供参考,不一定真实有效额,请勿全部Copy。