一份最全的谷歌全系蜘蛛爬虫名单列表

前几天，在本站论坛给出了Mediapartners-Google蜘蛛的作用：Mediapartners-Google是什么蜘蛛。今天，再给出谷歌旗下所有蜘蛛的列表，以及这些蜘蛛的作用，方便各位解决各种谷歌爬虫引起的问题。

谷歌旗下一共有九类爬虫，分别为API类、广告类、图片类、新闻类、视频类、网页类、订阅类、图标类、页面转码类等爬虫。共计十七个爬虫，分别为APIs-Google、AdSense、AdsBot Mobile Web Android、AdsBot Mobile Web、AdsBot、Googlebot Image、Googlebot News、Googlebot Video、Googlebot (Desktop)、Googlebot (Smartphone)、Mobile AdSense、Mobile Apps Android、Feedfetcher、Google Read Aloud、Duplex on the web、Google Favicon、Web Light等爬虫。

下面我们将分别给出这十七个爬虫的UA列表：

UA表示User agent，即每个网络主机都有的一个客户端身份标记；这里将列出这些UA的简写及其详细UA。下文中，我们用UA表示简写，用User agent表示全称。

APIs-Google

UA：APIs-Google
User Agent：APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)

AdSense

UA：Mediapartners-Google
User Agent：Mediapartners-Google

AdsBot Mobile Web Android

UA：AdsBot-Google-Mobile
User Agent：Mozilla/5.0 (Linux; Android 5.0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)

AdsBot Mobile Web

UA：AdsBot-Google-Mobile
User Agent：Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)

AdsBot

UA：AdsBot-Google
User Agent：AdsBot-Google (+http://www.google.com/adsbot.html)

Googlebot Image

UA：Googlebot-Image
UA：Googlebot
User Agent：Googlebot-Image/1.0

Googlebot News

UA：Googlebot-News
UA：Googlebot
User Agent：Googlebot-News

Googlebot Video

UA：Googlebot-Video
UA：Googlebot
User Agent：Googlebot-Video/1.0

Googlebot (Desktop)

UA：Googlebot
User Agent：
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z^‡ Safari/537.36
  or (rarely used):
- Googlebot/2.1 (+http://www.google.com/bot.html)

Googlebot (Smartphone)

UA：Googlebot
User Agent：Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z^‡ Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Mobile AdSense

UA：Mediapartners-Google
User Agent：(Various mobile device types) (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.html)

Mobile Apps Android

UA：AdsBot-Google-Mobile-Apps
User Agent：AdsBot-Google-Mobile-Apps

Feedfetcher

UA：FeedFetcher-Google
Does not respect robots.txt rules
User Agent：FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)

Google Read Aloud

UA：Google-Read-Aloud
Does not respect robots.txt rules
Current agents:
- Desktop agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers)
- Mobile agent: Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers)
Former agent (deprecated): google-speakr

Duplex on the web

UA：DuplexWeb-Google
May ignore the * user-agent wildcard
User Agent：Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; DuplexWeb-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Mobile Safari/537.36

Google Favicon

UA：Google Favicon
For user-initiated requests, ignores robots.txt rules
User Agent：Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon

Web Light

UA：googleweblight
Does not respect robots.txt rules
User Agent：Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19

文中出现的Chrome/W.X.Y.Z^‡表示Chrome的任意版本号，即爬虫的Chrome版本会随着时间的推移而变化。

下面举个例子，来说明如何使用这些蜘蛛爬虫并限制他们抓取哪些页面：

User-agent: Googlebot
Disallow: /
User-agent: Mediapartners-Google
Disallow:

上述代码表示禁止谷歌收录网站，但允许谷歌广告爬虫爬站，这样网站内的谷歌广告还会是正常的。

User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /personal

上述代码允许谷歌收录网站，但不允许图片爬虫爬取personal目录的图片！

这些代码需要放置在网站根目录robots.txt文件中即可生效。

转载文章，原文出处：Google Search Central，由古哥整理发布

如若转载，请注明出处：https://iymark.com/articles/950.html

一份最全的谷歌全系蜘蛛爬虫名单列表

你可能感兴趣的文章

为什么我愿意使用谷歌广告来增加网站的盈利

近日谷歌广告出现无法加载的情况

Google Adsense将不再支持西联汇款

fat肥鱼游戏主播youtube账号广告收益

古哥带你轻松玩转谷歌自动广告的那些设置项

检测用户是否开启了广告屏蔽插件并弹窗提醒

发表回复