Common Web Attack Bots and Scanners — What They Look For and Why It Matters

Every public server gets scanned. Not occasionally — constantly. The moment an IP address is reachable on ports 80 and 443, automated scanners start knocking. Most people don't realize how much of their server's resources go toward politely responding to bots that will never be customers, readers, or legitimate users. This article covers what those bots are, what they're looking for, and why it matters — even when they don't find anything.

Config-file harvesters

These are the most common — and the ones most likely to cause performance problems. Bots sweep entire IP ranges looking for exposed configuration files:

GET /.env HTTP/1.1
GET /core/sql/database%2eenv HTTP/1.1
GET /config.yml HTTP/1.1
GET /aws/credentials HTTP/1.1
GET /wp-config.php HTTP/1.1
GET /src/config/%2eenv HTTP/1.1
GET /%2eenv%2elocal%2ejpg HTTP/1.1
GET /customer/config%2eyaml HTTP/1.1

.env files are the big prize — they often contain database passwords, API keys, and service credentials in plain text. But even when a bot never finds one, the damage is already underway. Each request — encoded or not — that makes it past Nginx to PHP-FPM consumes a PHP worker, a database connection, and CPU cycles. A single scanner can fire off 20–30 variants in under a minute. Multiply that by dozens of scanners across a day and you're burning resources on requests that have no business touching your application layer.

Nginx can block these at the edge with deny all rules before PHP ever sees them. The encoded variants — %2e for a dot, %2f for a slash — are Nginx's problem, not yours, as long as your deny all rules use location blocks that Nginx processes after decoding the URL. If you see these in your error log, your Nginx rules are working. If you see them in your PHP-FPM access log, they're not.

Vulnerability scanners

These bots aren't targeting you specifically. They're sweeping for known CVEs — vulnerabilities in WordPress plugins, PHP frameworks, and common web applications — hoping to find unpatched installations:

GET /wp-login.php HTTP/1.1
GET /xmlrpc.php HTTP/1.1
GET /cgi-bin/ HTTP/1.1
GET /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1
GET /wp-admin/setup-config.php HTTP/1.1
GET /wp-content/plugins/revslider/ HTTP/1.1

A scanner that hits /vendor/phpunit/ on a server running WordPress doesn't know or care what CMS you're using. It's checking everything. The probing is indiscriminate — they'll try PHPUnit endpoints on a Python server, WordPress paths on a static site, and Drupal paths on a Flask app. The bandwidth per request is negligible. The PHP-FPM workers that spin up to handle each one are not.

Credential-stuffing bots

POST /wp-login.php HTTP/1.1
POST /admin/login HTTP/1.1
POST /user/login HTTP/1.1
POST /api/auth HTTP/1.1

They're betting on password reuse. Someone whose credentials leaked from a forum breach in 2021 might still use the same password on their website, their email, their hosting panel. These bots don't need to succeed often — a 0.1% hit rate across thousands of sites is profitable. But for your server, the repeated login attempts — successful or not — are expensive. Each one runs through the full authentication stack: form parsing, database lookup, password hashing, session creation. Rate-limiting the login endpoint at the Nginx level keeps a single bot from exhausting your PHP pool while legitimate users still get through.

Directory brute-forcers

These bots try to find forgotten files, backups, and open directories that were never meant to be public:

GET /backup.zip HTTP/1.1
GET /db.sql HTTP/1.1
GET /old/ HTTP/1.1
GET /test/ HTTP/1.1
GET /dev/ HTTP/1.1
GET /staging/ HTTP/1.1
GET /.git/HEAD HTTP/1.1
GET /.svn/entries HTTP/1.1

/.git/HEAD is worth calling out — if a .git directory is exposed, the bot can reconstruct your entire repository from the objects inside, including commit history and any secrets that were ever committed. The location ~ /\. rule that denies hidden files catches all of these at the Nginx level before they touch your application.

API enumeration bots

These assume you're running a framework — Laravel, Symfony, Django REST, Express — and probe for default API endpoints:

GET /api/v1/users HTTP/1.1
GET /api/v1/admin HTTP/1.1
GET /api/auth/login HTTP/1.1
GET /graphql HTTP/1.1
GET /.well-known/openid-configuration HTTP/1.1

If your site doesn't have an API, these requests just return 404s. But a 404 from a CMS like WordPress still goes through the full PHP stack — the request hits index.php, WordPress boots, queries the database to determine nothing matches, and returns the 404 template. One bot doing this 50 times a minute is trivial. Two hundred bots doing it simultaneously is a database outage. Catching these with a return 444 or a rate limit before they reach PHP keeps your database free for actual visitors.

Why it matters even when they don't find anything

A failed attack is not harmless. It consumes resources that could be serving your actual visitors:

PHP-FPM workers — Each request that reaches PHP occupies a worker for the duration of the response. A typical 2 GB VPS might run 4–8 PHP workers. Ten bots hitting uncached pages simultaneously can exhaust that pool and serve 502 errors to everyone else.
Database connections — WordPress and most CMS platforms open a database connection on every request, even for 404s. A brute-force scanner hitting 30 paths per second is opening 30 database connections per second.
Log volume — A busy day of bot traffic can produce gigabytes of access and error logs. That's disk I/O you didn't budget for and log entries that bury the real problems when you're debugging at 2 AM.
Rate-limit pollution — If you rate-limit by IP and a bot is hammering you from an IPv6 range you share with legitimate visitors (some mobile carriers and universities), your real users get throttled along with the bot.
Noise masking real attacks — When every log is full of /.env 403s, a targeted attack that's actually trying to exploit a real vulnerability gets lost in the noise.

How to spot them in your logs

A few telltale signs that the traffic you're seeing is automated:

Rapid-fire requests — 10+ requests in under a second from the same IP
Requests for files or frameworks you don't use
Encoded paths — %2e, %2f, %00
No Referer header, no cookies
Sequential or alphabetical path probing — /a/, /b/, /c/
IPv6 addresses from cloud provider ranges — data center scanners frequently use IPv6

What to do about it

The goal is not to stop scanning — you can't. The goal is to keep the scanning from reaching your application. A layered approach:

Nginx-level blocks: deny all on hidden files, backup files, config extensions, and known attack paths before the requests reach PHP
Rate limiting: Cap requests per IP at the Nginx level — a bot that can make 30 requests per second becomes a bot that can make 1
fail2ban: Watch your error logs for repeated 403s and ban repeat offenders at the firewall
Crowd-sourced blacklists: AbuseIPDB, Bitwire, and Spamhaus can drop known bad actors before they ever reach your server

The full setup — Nginx forbidden-request jails, AbuseIPDB reporting, incremental banning, and daily blacklist imports — is covered in Fail2ban with nftables and Crowd-Sourced Blacklists.

Technical Audit Summary

This article catalogs real bot traffic patterns observed on a production server — config-file harvesters, vulnerability scanners, credential-stuffing, directory brute-forcers, and API enumerators — with log excerpts and the resource impact behind each category.

Last Audit: May 2026

Environment: Debian Trixie (13)

Nginx: 1.30.2

PHP-FPM: 8.5.6

Scope: This article explains what the bots are, what they're after, and why even failed probes consume meaningful server resources. Mitigation is summarized in the closing section; the full fail2ban + nftables + crowd-sourced blacklist implementation lives in the fail2ban guide.