PHP屏蔽蜘蛛訪問代碼及常用搜索引擎的HTTP_USER_AGENT
PHP屏蔽蜘蛛訪問代碼代碼:

常用搜索引擎名與 HTTP_USER_AGENT對(duì)應(yīng)值
百度baiduspider
谷歌googlebot
搜狗sogou
騰訊SOSOsosospider
雅虎slurp
有道youdaobot
Bingbingbot
MSNmsnbot
Alexais_archiver
function is_crawler() {
$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
$spiders = array(
'Googlebot', // Google 爬蟲
'Baiduspider', // 百度爬蟲
'Yahoo! Slurp', // 雅虎爬蟲
'YodaoBot', // 有道爬蟲
'msnbot' // Bing爬蟲
// 更多爬蟲關(guān)鍵字
);
foreach ($spiders as $spider) {
$spider = strtolower($spider);
if (strpos($userAgent, $spider) !== false) {
return true;
}
}
return false;
}
下面的php代碼附帶了更多的蜘蛛標(biāo)識(shí)
function isCrawler() {
echo $agent= strtolower($_SERVER['HTTP_USER_AGENT']);
if (!empty($agent)) {
$spiderSite= array(
"TencentTraveler",
"Baiduspider+",
"BaiduGame",
"Googlebot",
"msnbot",
"Sosospider+",
"Sogou web spider",
"ia_archiver",
"Yahoo! Slurp",
"YoudaoBot",
"Yahoo Slurp",
"MSNBot",
"Java (Often spam bot)",
"BaiDuSpider",
"Voila",
"Yandex bot",
"BSpider",
"twiceler",
"Sogou Spider",
"Speedy Spider",
"Google AdSense",
"Heritrix",
"Python-urllib",
"Alexa (IA Archiver)",
"Ask",
"Exabot",
"Custo",
"OutfoxBot/YodaoBot",
"yacy",
"SurveyBot",
"legs",
"lwp-trivial",
"Nutch",
"StackRambler",
"The web archive (IA Archiver)",
"Perl tool",
"MJ12bot",
"Netcraft",
"MSIECrawler",
"WGet tools",
"larbin",
"Fish search",
);
foreach($spiderSite as $val) {
$str = strtolower($val);
if (strpos($agent, $str) !== false) {
return true;
}
}
} else {
return false;
}
}
if (isCrawler()){
echo "你好蜘蛛精!";
}
else{
echo "你不是蜘蛛精??!";
}
使用PHP實(shí)現(xiàn)蜘蛛訪問日志統(tǒng)計(jì)
$useragent = addslashes(strtolower($_SERVER['HTTP_USER_AGENT']));
if (strpos($useragent, 'googlebot')!== false){$bot = 'Google';}
elseif (strpos($useragent,'mediapartners-google') !== false){$bot = 'Google Adsense';}
elseif (strpos($useragent,'baiduspider') !== false){$bot = 'Baidu';}
elseif (strpos($useragent,'sogou spider') !== false){$bot = 'Sogou';}
elseif (strpos($useragent,'sogou web') !== false){$bot = 'Sogou web';}
elseif (strpos($useragent,'sosospider') !== false){$bot = 'SOSO';}
elseif (strpos($useragent,'360spider') !== false){$bot = '360Spider';}
elseif (strpos($useragent,'yahoo') !== false){$bot = 'Yahoo';}
elseif (strpos($useragent,'msn') !== false){$bot = 'MSN';}
elseif (strpos($useragent,'msnbot') !== false){$bot = 'msnbot';}
elseif (strpos($useragent,'sohu') !== false){$bot = 'Sohu';}
elseif (strpos($useragent,'yodaoBot') !== false){$bot = 'Yodao';}
elseif (strpos($useragent,'twiceler') !== false){$bot = 'Twiceler';}
elseif (strpos($useragent,'ia_archiver') !== false){$bot = 'Alexa_';}
elseif (strpos($useragent,'iaarchiver') !== false){$bot = 'Alexa';}
elseif (strpos($useragent,'slurp') !== false){$bot = '雅虎';}
elseif (strpos($useragent,'bot') !== false){$bot = '其它蜘蛛';}
if(isset($bot)){
$fp = @fopen('bot.txt','a');
fwrite($fp,date('Y-m-d H:i:s')."\t".$_SERVER["REMOTE_ADDR"]."\t".$bot."\t".'http://'.$_SERVER['SERVER_NAME'].$_SERVER["REQUEST_URI"]."\r\n");
fclose($fp);
}
相關(guān)文章
php 無(wú)限級(jí)分類 獲取頂級(jí)分類ID
這篇文章主要介紹了php 無(wú)限級(jí)分類 獲取頂級(jí)分類ID的相關(guān)代碼,需要的朋友可以參考下2016-03-03
晉城吧對(duì)DiscuzX進(jìn)行的前端優(yōu)化要點(diǎn)
晉城吧的服務(wù)器在美國(guó),延遲相對(duì)國(guó)內(nèi)略微要高一些,所以優(yōu)化就顯得非常重要。2010-09-09
學(xué)習(xí)php設(shè)計(jì)模式 php實(shí)現(xiàn)原型模式(prototype)
這篇文章主要介紹了php設(shè)計(jì)模式中的原型模式,使用php實(shí)現(xiàn)原型模式,感興趣的小伙伴們可以參考一下2015-12-12
php常用字符串查找函數(shù)strstr()與strpos()實(shí)例分析
這篇文章主要介紹了php常用字符串查找函數(shù)strstr()與strpos(),結(jié)合具體實(shí)例形式分析了php字符串查找函數(shù)strstr()與strpos()的具體功能、用法、區(qū)別及相關(guān)操作注意事項(xiàng),需要的朋友可以參考下2019-06-06
php鏈?zhǔn)讲僮鞯膶?shí)現(xiàn)方式分析
這篇文章主要介紹了php鏈?zhǔn)讲僮鞯膶?shí)現(xiàn)方式,結(jié)合實(shí)例形式對(duì)比分析了常規(guī)調(diào)用與鏈?zhǔn)秸{(diào)用操作的相關(guān)實(shí)現(xiàn)技巧與操作注意事項(xiàng),需要的朋友可以參考下2019-08-08
PHP使用zlib擴(kuò)展實(shí)現(xiàn)GZIP壓縮輸出的方法詳解
這篇文章主要介紹了PHP使用zlib擴(kuò)展實(shí)現(xiàn)GZIP壓縮輸出的方法,結(jié)合實(shí)例形式詳細(xì)分析了php gzip配置及壓縮輸出的相關(guān)操作技巧,需要的朋友可以參考下2018-04-04

