Crawler-Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Top Related Projects
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
Quick Overview
Crawler-Detect is a PHP library designed to detect bots, crawlers, and spiders accessing your website. It uses a comprehensive list of user agents and known crawler patterns to identify automated visitors, helping website owners distinguish between human and non-human traffic.
Pros
- Large database of known crawler patterns and user agents
- Regular updates to keep up with new bots and crawlers
- Easy integration into existing PHP projects
- Supports both procedural and object-oriented programming styles
Cons
- Limited to PHP environments
- May require frequent updates to maintain accuracy
- Potential for false positives with less common or custom user agents
- Performance impact on high-traffic websites due to pattern matching
Code Examples
- Basic usage:
use Jaybizzle\CrawlerDetect\CrawlerDetect;
$CrawlerDetect = new CrawlerDetect;
if($CrawlerDetect->isCrawler()) {
// Handle crawler
} else {
// Handle human visitor
}
- Checking a specific user agent:
$userAgent = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)';
$CrawlerDetect = new CrawlerDetect;
if($CrawlerDetect->isCrawler($userAgent)) {
echo "This user agent is a crawler";
}
- Getting the matched crawler name:
$CrawlerDetect = new CrawlerDetect;
if($CrawlerDetect->isCrawler()) {
echo "Crawler detected: " . $CrawlerDetect->getMatches();
}
Getting Started
- Install via Composer:
composer require jaybizzle/crawler-detect
- Include in your PHP file:
require_once 'vendor/autoload.php';
use Jaybizzle\CrawlerDetect\CrawlerDetect;
$CrawlerDetect = new CrawlerDetect;
if($CrawlerDetect->isCrawler()) {
// Crawler detected
} else {
// Human visitor
}
Competitor Comparisons
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
Pros of crawler-user-agents
- Lightweight and simple JSON-based approach
- Community-driven with frequent updates
- Easy integration into various programming languages
Cons of crawler-user-agents
- Limited to user agent strings only
- Lacks advanced detection methods
- May require additional processing for complex scenarios
Code Comparison
Crawler-Detect:
$CrawlerDetect = new Jaybizzle\CrawlerDetect\CrawlerDetect;
if($CrawlerDetect->isCrawler()) {
// Handle crawler
}
crawler-user-agents:
import json
with open('crawler-user-agents.json') as f:
crawlers = json.load(f)
if any(crawler['pattern'] in user_agent for crawler in crawlers):
# Handle crawler
Summary
Crawler-Detect offers a more comprehensive solution with advanced detection methods and regular expression matching. It provides a ready-to-use PHP library with built-in functionality for detecting various crawlers and bots.
crawler-user-agents, on the other hand, is a simpler, data-driven approach that focuses on maintaining an up-to-date list of crawler user agent strings. It's more flexible in terms of language integration but requires additional implementation for detection logic.
Choose Crawler-Detect for a robust, out-of-the-box PHP solution, or opt for crawler-user-agents if you need a lightweight, customizable approach across different programming languages.
The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
Pros of device-detector
- More comprehensive detection capabilities, including devices, operating systems, and browsers
- Regularly updated with a larger database of user agents
- Supports multiple programming languages through ports
Cons of device-detector
- Larger codebase and potentially higher resource usage
- More complex setup and integration process
- May be overkill for simple bot detection use cases
Code Comparison
Crawler-Detect:
$CrawlerDetect = new Jaybizzle\CrawlerDetect\CrawlerDetect;
if($CrawlerDetect->isCrawler()) {
// Handle crawler
}
device-detector:
$dd = new DeviceDetector($userAgent);
$dd->parse();
if ($dd->isBot()) {
// Handle bot
}
Both libraries offer straightforward usage, but device-detector provides more detailed information about the detected user agent. Crawler-Detect focuses solely on identifying crawlers, while device-detector offers broader device and browser detection capabilities.
device-detector's more extensive feature set comes at the cost of increased complexity and resource usage. For projects requiring only crawler detection, Crawler-Detect may be a more lightweight and focused solution. However, for applications needing comprehensive user agent analysis, device-detector's additional capabilities make it a more versatile choice.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
About CrawlerDetect
CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
and http_from
header. Currently able to detect 1,000's of bots/spiders/crawlers.
Installation
composer require jaybizzle/crawler-detect
Usage
use Jaybizzle\CrawlerDetect\CrawlerDetect;
$CrawlerDetect = new CrawlerDetect;
// Check the user agent of the current 'visitor'
if($CrawlerDetect->isCrawler()) {
// true if crawler user agent detected
}
// Pass a user agent as a string
if($CrawlerDetect->isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')) {
// true if crawler user agent detected
}
// Output the name of the bot that matched (if any)
echo $CrawlerDetect->getMatches();
Contributing
If you find a bot/spider/crawler user agent that CrawlerDetect fails to detect, please submit a pull request with the regex pattern added to the $data
array in Fixtures/Crawlers.php
and add the failing user agent to tests/crawlers.txt
.
Failing that, just create an issue with the user agent you have found, and we'll take it from there :)
Laravel Package
If you would like to use this with Laravel, please see Laravel-Crawler-Detect
Symfony Bundle
To use this library with Symfony 2/3/4, check out the CrawlerDetectBundle.
YII2 Extension
To use this library with the YII2 framework, check out yii2-crawler-detect.
ES6 Library
To use this library with NodeJS or any ES6 application based, check out es6-crawler-detect.
Python Library
To use this library in a Python project, check out crawlerdetect.
JVM Library (written in Java)
To use this library in a JVM project (including Java, Scala, Kotlin, etc.), check out CrawlerDetect.
.NET Library
To use this library in a .net standard (including .net core) based project, check out NetCrawlerDetect.
Ruby Gem
To use this library with Ruby on Rails or any Ruby-based application, check out crawler_detect gem.
Go Module
To use this library with Go, check out the crawlerdetect module.
Parts of this class are based on the brilliant MobileDetect
Top Related Projects
Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot