前言
闲来无聊,没事就对现在流行网站做了研究,近日,近日头条网站被我发现竟然有一个PC站,没事就研究了一下获取文章列表,登录,发布文章等常用Java爬虫系列相关操作,基本一路研究下来,所有加密,登录,发帖难度部分,都已经被破解,就先对登录部分做一个分享,希望大家喜欢!
步骤及代码
登录有几种方法,有手机登录,用户密码登录,QQ登录,微信登录,常用的就是手机登录和用户密码登录,我登录尝试了一下,用户密码登录,貌似会提示建议使用手机验证码登录,这对于爬虫开发,简直是太 不友好了,是否老用户会没有这个提示呢,不得而知了,所以,这里就分享手机登录的例子了,什么QQ登陆,微信登录的,我是不推荐的,其实对于腾讯QQ的密码加密算法,能破解的,可是没有几个人的啊,哈哈,好吧,登录开始,我们走起。
使用语言:Java
使用Jar包: Java HttpClient 4.x核心Jar包
开发工具:MyEclipse 8
步骤如下:
首先,输入手机号码,点击发送验证码,中间可能会弹出滑动验证码,也可能没有提示
然后,手动滑动验证码,然后等待手机验证码
最后,手机验证码输入成功,登录成功。
接着,输入成功后,需要继续请求https://sso.toutiao.com/quick_login/,这一步是比较重要的,这样才能获取用户登录后的所有Cookie,才算真正意义上的登录。
分享部分核心代码,希望大家喜欢:
//登录链接 String cookies = "__tasessionId="+__tasessionId ; g1 = new HttpGet("https://sso.toutiao.com/login/") ; g1.setHeader("Accept", "text/html, application/xhtml+xml, */*") ; g1.setHeader("Accept-Language", "zh-CN") ; g1.setHeader("Proxy-Connection", "Keep-Alive") ; g1.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko") ; g1.setHeader("Cookie", cookies) ; HttpResponse response2 = httpClient.execute(g1); sg1 = EntityUtils.toString(response2.getEntity(),"utf-8") ;
模拟网站用户操作日志
if(true){ g1 = new HttpGet("https://www.toutiao.com/api/article/user_log/?c=sso_login&sid=" +__tasessionId+"&type=pageview&t="+System.currentTimeMillis()) ; g1.setHeader("Accept", "text/html, application/xhtml+xml, */*") ; g1.setHeader("Accept-Language", "zh-CN") ; g1.setHeader("Proxy-Connection", "Keep-Alive") ; g1.setHeader("Referer", "https://sso.toutiao.com/login/") ; g1.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko") ; g1.setHeader("Cookie", "sso_login_status=0") ; response2 = httpClient.execute(g1); sg1 = EntityUtils.toString(response2.getEntity(),"utf-8") ; System.out.println(sg1); }
根据手机获取手机验证码代码:
if(true){ g1 = new HttpGet("https://sso.toutiao.com/send_activation_code/?mobile=" +phone+"&captcha="+captcha+"&type=24") ; g1.setHeader("Accept", "text/html, application/xhtml+xml, */*") ; g1.setHeader("Accept-Language", "zh-CN") ; g1.setHeader("Proxy-Connection", "Keep-Alive") ; g1.setHeader("X-Requested-With", "XMLHttpRequest") ; g1.setHeader("Referer", "https://sso.toutiao.com/login/") ; g1.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko") ; g1.setHeader("Cookie", "sso_login_status=0") ; response2 = httpClient.execute(g1); sg1 = EntityUtils.toString(response2.getEntity(),"utf-8") ; sg1 = StringRandomUtils.unicodeToString(sg1) ; System.out.println(sg1); }
登录后续获取用户Cookie部分:
String phonecode = ImgUtils.showPhoneMessage("toutiao", "", Settingid) ; g3 = new HttpPost("https://sso.toutiao.com/quick_login/") ; g3.setHeader("Accept", "text/javascript, text/html, application/xml, text/xml, */*") ; g3.setHeader("X-CSRFToken", "undefined") ; g3.setHeader("X-Requested-With", "XMLHttpRequest") ; g3.setHeader("Accept-Language", "zh-CN") ; g3.setHeader("Cache-Control", "no-cache") ; g3.setHeader("Connection", "Keep-Alive") ; g3.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko") ; g3.setHeader("Content-Type", "application/x-www-form-urlencoded") ; List<NameValuePair> qparams = new ArrayList<NameValuePair>(); qparams.add(new BasicNameValuePair("mobile", phone)); qparams.add(new BasicNameValuePair("code", phonecode)); qparams.add(new BasicNameValuePair("account", "")); qparams.add(new BasicNameValuePair("password", "")); qparams.add(new BasicNameValuePair("captcha", captcha)); qparams.add(new BasicNameValuePair("is_30_days_no_login","false")); qparams.add(new BasicNameValuePair("service","http://www.toutiao.com/")); UrlEncodedFormEntity params = new UrlEncodedFormEntity(qparams, "UTF-8"); g3.setEntity(params); response2 = httpClient.execute(g3); sg1 = EntityUtils.toString(response2.getEntity(),"utf-8") ; sg1 = StringRandomUtils.unicodeToString(sg1) ; System.out.println(sg1);
总结
近日头条登录部分,使用了手机登录,没有比较复杂的算法,还是比较容易的,后续分享发布文章,获取文章列表等代码,需要代码的,可以联系额~~