C# 關(guān)于爬取網(wǎng)站數(shù)據(jù)遇到csrf-token的分析與解決
需求
某航空公司物流單信息查詢(xún),是一個(gè)post請(qǐng)求。通過(guò)后臺(tái)模擬POST HTTP請(qǐng)求發(fā)現(xiàn)無(wú)法獲取頁(yè)面數(shù)據(jù),通過(guò)查看航空公司網(wǎng)站后,發(fā)現(xiàn)網(wǎng)站使用避免CSRF攻擊機(jī)制,直接發(fā)揮40X錯(cuò)誤。
關(guān)于CSRF
讀者自行百度
網(wǎng)站HTTP請(qǐng)求分析
Headers

Form Data

在head里包含了cookie 與 x-csrf-token formdata 里包含了_csrf (與head里的值是一樣的).


這里通過(guò)查看該網(wǎng)站的JS源代碼發(fā)現(xiàn)_csrf 來(lái)自于網(wǎng)頁(yè)的head標(biāo)簽里
猜測(cè)cookie與 x-csrf-token是有一定的有效期,并且他們共同作用來(lái)防御CSRF攻擊。
解決方案
1,首先請(qǐng)求一下該航空公司的網(wǎng)站,獲取cookie與_csrf
2,然后C# 模擬http分別在head和formdata里加入如上參數(shù),發(fā)起請(qǐng)求
代碼
public class CSRFToken
{
string cookie;//用于請(qǐng)求的站點(diǎn)的cookie
List<string> csrfs;//用于請(qǐng)求站點(diǎn)的token的key 以及 value
public CSRFToken(string url)
{
//校驗(yàn)傳輸安全
if (!string.IsNullOrWhiteSpace(url))
{
try
{
//設(shè)置請(qǐng)求的頭信息.獲取url的host
var _http = new HttpHelper(url);
string cookie;
string html = _http.CreateGetHttpResponseForPC(out cookie);
this.cookie = cookie;
string headRegex = @"<meta name=""_csrf.*"" content="".*""/>";
MatchCollection matches = Regex.Matches(html, headRegex);
Regex re = new Regex("(?<=content=\").*?(?=\")", RegexOptions.None);
csrfs = new List<string>();
foreach (Match math in matches)
{
MatchCollection mc = re.Matches(math.Value);
foreach (Match ma in mc)
{
csrfs.Add(ma.Value);
}
}
}
catch (Exception e)
{
}
}
}
public String getCookie()
{
return cookie;
}
public void setCookie(String cookie)
{
this.cookie = cookie;
}
public List<string> getCsrf_token()
{
return csrfs;
}
}
httpHelper
public string CreatePostHttpResponse(IDictionary<string, string> headers, IDictionary<string, string> parameters)
{
HttpWebRequest request = null;
//HTTPSQ請(qǐng)求
UTF8Encoding encoding = new System.Text.UTF8Encoding();
ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CheckValidationResult);
request = WebRequest.Create(_baseIPAddress) as HttpWebRequest;
request.ProtocolVersion = HttpVersion.Version10;
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
// request.ContentType = "application/json";
request.UserAgent = DefaultUserAgent;
//request.Headers.Add("X-CSRF-TOKEN", "bc0cc533-60cc-484a-952d-0b4c1a95672c");
//request.Referer = "https://www.asianacargo.com/tracking/viewTraceAirWaybill.do";
//request.Headers.Add("Origin", "https://www.asianacargo.com");
//request.Headers.Add("Cookie", "JSESSIONID=HP21d2Dq5FoSlG4Fyw4slWwHb0-Sl1CG6jGtj7HE41e5f4aN_R1p!-435435446!117330181");
//request.Host = "www.asianacargo.com";
if (!(headers == null || headers.Count == 0))
{
foreach (string key in headers.Keys)
{
request.Headers.Add(key, headers[key]);
}
}
//如果需要POST數(shù)據(jù)
if (!(parameters == null || parameters.Count == 0))
{
StringBuilder buffer = new StringBuilder();
int i = 0;
foreach (string key in parameters.Keys)
{
if (i > 0)
{
buffer.AppendFormat("&{0}={1}", key, parameters[key]);
}
else
{
buffer.AppendFormat("{0}={1}", key, parameters[key]);
}
i++;
}
byte[] data = encoding.GetBytes(buffer.ToString());
using (Stream stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
}
HttpWebResponse response;
try
{
//獲得響應(yīng)流
response = (HttpWebResponse)request.GetResponse();
Stream s = response.GetResponseStream();
StreamReader readStream = new StreamReader(s, Encoding.UTF8);
string SourceCode = readStream.ReadToEnd();
response.Close();
readStream.Close();
return SourceCode;
}
catch (WebException ex)
{
response = ex.Response as HttpWebResponse; return null;
}
}
public string CreateGetHttpResponse(out string cookie)
{
HttpWebRequest request = null;
//HTTPSQ請(qǐng)求
UTF8Encoding encoding = new System.Text.UTF8Encoding();
ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CheckValidationResult);
request = WebRequest.Create(_baseIPAddress) as HttpWebRequest;
request.ProtocolVersion = HttpVersion.Version10;
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11;
request.Method = "GET";
request.ContentType = "application/x-www-form-urlencoded";
request.UserAgent = DefaultUserAgent;
HttpWebResponse response;
try
{
//獲得響應(yīng)流
response = (HttpWebResponse)request.GetResponse();
cookie = response.Headers["Set-Cookie"];
Stream s = response.GetResponseStream();
StreamReader readStream = new StreamReader(s, Encoding.UTF8);
string SourceCode = readStream.ReadToEnd();
response.Close();
readStream.Close();
return SourceCode;
}
catch (WebException ex)
{
response = ex.Response as HttpWebResponse;
cookie = "";
return null;
}
}
爬取程序

爬取結(jié)果

瀏覽器結(jié)果

注意事項(xiàng)與結(jié)論
1,不同的網(wǎng)站,獲取cstf的方式不一樣,無(wú)論怎么做,只要信息傳到前臺(tái)我們都可以有相應(yīng)的方法來(lái)獲取。
2,請(qǐng)求時(shí)候的http驗(yàn)證可能不一樣,測(cè)試的某航空公司物流信息的時(shí)候,http請(qǐng)求的安全協(xié)議是tis12。
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11; 還有其他參數(shù)比如UserAgent后臺(tái)可能也會(huì)驗(yàn)證
3,基于如上航空公司,發(fā)現(xiàn)它的cookie和cstf_token一定時(shí)間內(nèi)不會(huì)改變,那么當(dāng)實(shí)際爬取的時(shí)候可以考慮緩存cookie以及cstf_token,只有當(dāng)請(qǐng)求失敗的時(shí)候,才重新獲取
相關(guān)文章
C#最簡(jiǎn)單的關(guān)閉子窗體更新父窗體的實(shí)現(xiàn)方法
原理就是將子窗體最為對(duì)話(huà)框模式彈出,當(dāng)窗體關(guān)閉或取消時(shí)更新主窗體2012-11-11
C# 使用Word模板導(dǎo)出數(shù)據(jù)的實(shí)現(xiàn)代碼
最近接到個(gè)需求,使用word模板導(dǎo)出數(shù)據(jù),怎么實(shí)現(xiàn)這個(gè)需求呢,今天小編通過(guò)實(shí)例代碼給大家介紹C# 使用Word模板導(dǎo)出數(shù)據(jù)的方法,感興趣的朋友一起看看吧2021-06-06
基于c#用Socket做一個(gè)局域網(wǎng)聊天工具
目前基于Internet的即時(shí)聊天工具已經(jīng)做的非常完美,本文介紹了基于c#用Socket做一個(gè)局域網(wǎng)聊天工具,有需要的朋友可以看一下。2016-10-10

