设为首页 - 加入收藏 临汾站长网 (http://www.0357zz.com)- 国内知名站长资讯网站,提供最新最全的站长资讯,创业经验,网站建设等!
热搜: 2015 平台 2016 系统
当前位置: 首页 > 白姐旗袍 > 外闻 > 正文

服务器反爬虫攻略:Apache/Nginx/PHP禁止某些User Agent抓取网站

发布时间:2019-02-21 06:55 所属栏目:[外闻] 来源:bcoder
导读:一、Apache ①、通过修改 .htaccess 文件 修改网站目录下的.htaccess,添加如下代码即可(2 种代码任选): 可用代码 (1): RewriteEngineOn RewriteCond%{HTTP_USER_AGENT}(^$|FeedDemon|IndyLibrary|AlexaToolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadW

?一、Apache

①、通过修改 .htaccess 文件

修改网站目录下的.htaccess,添加如下代码即可(2 种代码任选):

可用代码 (1):

  1. RewriteEngine?On?
  2. RewriteCond?%{HTTP_USER_AGENT}?(^$|FeedDemon|Indy?Library|Alexa?Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft?URL?Control|Swiftbot|ZmEu|oBot|jaunty|Python–urllib|lightDeckReports?Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)?[NC]?
  3. RewriteRule?^(.*)$?–?[F]?

可用代码 (2):

  1. SetEnvIfNoCase?^User–Agent$?.*(FeedDemon|Indy?Library|Alexa?Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft?URL?Control|Swiftbot|ZmEu|oBot|jaunty|Python–urllib|lightDeckReports?Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)?BADBOT?
  2. Order?Allow,Deny?
  3. Allow?from?all?
  4. Deny?from?env=BADBOT?

②、通过修改 httpd.conf 配置文件

找到如下类似位置,根据以下代码 新增 / 修改,然后重启 Apache 即可:

Shell

  1. DocumentRoot?/home/wwwroot/xxx?
  2. ?
  3. SetEnvIfNoCase?User–Agent?“.*(FeedDemon|Indy?Library|Alexa?Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft?URL?Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports?Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)”?BADBOT?
  4. ????????Order?allow,deny?
  5. ????????Allow?from?all?
  6. ???????deny?from?env=BADBOT?
  7. ?

服务器反爬虫攻略:Apache/Nginx/PHP禁止某些User Agent抓取网站

二、Nginx 代码

进入到 nginx 安装目录下的 conf 目录,将如下代码保存为 agent_deny.conf

  1. cd?/usr/local/nginx/conf?
  2. vim?agent_deny.conf?
  1. #禁止Scrapy等工具的抓取?
  2. if?($http_user_agent?~*?(Scrapy|Curl|HttpClient))?{?
  3. ?????return?403;?
  4. }?
  5. #禁止指定UA及UA为空的访问?
  6. if?($http_user_agent?~*?“FeedDemon|Indy?Library|Alexa?Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft?URL?Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports?Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|^$”?)?{?
  7. ?????return?403;?????????????
  8. }?
  9. #禁止非GET|HEAD|POST方式的抓取?
  10. if?($request_method?!~?^(GET|HEAD|POST)$)?{?
  11. ????return?403;?
  12. }?

然后,在网站相关配置中的 location / { 之后插入如下代码:

Shell

  1. include?agent_deny.conf;?

如下的配置:

Shell

  1. [marsge@Mars_Server?~]$?cat?/usr/local/nginx/conf/zhangge.conf?
  2. location?/?{?
  3. ????????try_files?$uri?$uri/?/index.php?$args;?
  4. ????????#这个位置新增1行:?
  5. ????????include?agent_deny.conf;?
  6. ????????rewrite?^/sitemap_360_sp.txt$?/sitemap_360_sp.php?last;?
  7. ????????rewrite?^/sitemap_baidu_sp.xml$?/sitemap_baidu_sp.php?last;?
  8. ????????rewrite?^/sitemap_m.xml$?/sitemap_m.php?last;?

保存后,执行如下命令,平滑重启 nginx 即可:

Shell

  1. /usr/local/nginx/sbin/nginx?–s?reload?

三、PHP 代码

将如下方法放到贴到网站入口文件 index.php 中的第一个

【免责声明】本站内容转载自互联网,其相关言论仅代表作者个人观点绝非权威,不代表本站立场。如您发现内容存在版权问题,请提交相关链接至邮箱:bqsm@foxmail.com,我们将及时予以处理。

网友评论
推荐文章