12 KiB
sqlmap architecture
A contributor-oriented map of how sqlmap is put together: the major components, how a run flows through them, and where to start looking for a given concern.
This is a map, not a spec. It describes the durable structure and data flow; for exact signatures, option names, and enumerable lists (tampers, DBMSes, options), the source is authoritative. When this document disagrees with the code, the code wins.
sqlmap runs on both Python 2.7 and 3.x; sources are kept pure-ASCII unless a literal
non-ASCII byte is unavoidable. Compatibility shims live in lib/core/compat.py and
thirdparty/six.
1. Entry points
| Entry | File | Purpose |
|---|---|---|
| CLI | sqlmap.py -> main() |
the scanner. Applies runtime patches, parses options, runs a scan. |
| REST API | sqlmapapi.py |
-s server / -c client wrappers around lib/utils/api.py. |
main() (sqlmap.py) does, in order: dirtyPatches() (monkey-patches stdlib for
quirks/security - see below), setPaths(), init() (option parsing + environment
setup), then dispatches to start() for a normal scan, or to the self-tests
(--smoke / --vuln-test / --api-test) in lib/core/testing.py.
2. Global state: conf and kb
Almost everything hangs off two process-global singletons defined in lib/core/data.py,
both AttribDict (attribute-accessible dicts; missing keys read back as None):
conf- the resolved user configuration (options + derived settings). What the user asked for.kb("knowledge base") - mutable runtime state discovered during a run (identified DBMS, injection points, page templates, caches, locks, counters).
The configuration pipeline (lib/core/):
parse/cmdline.py- argparse definition of every CLI option.core/optiondict.py- option name -> type map (used for config-file/API coercion).core/defaults.py- default values.core/option.py- the heavy lifter:_setConfAttributes(),_setKnowledgeBaseAttributes(),_setHTTPHandlers()(installs the global urllib opener incl. keep-alive), DBMS/encoding setup, etc. Merges CLI + config file + defaults intoconf/kb.core/settings.py- constants, version, regexes, thresholds. New constants go here.
Identifiers in the codebase are camelCase.
3. Top-level layout
| Path | Responsibility |
|---|---|
lib/core/ |
conf/kb model, common helpers, settings, enums, dump, session, agent, option parsing |
lib/controller/ |
the scan orchestrator (controller.py), detection checks (checks.py), enumeration dispatch (action.py), DBMS handler selection (handler.py) |
lib/request/ |
HTTP layer: connect.py (sending), comparison.py (the true/false oracle), inject.py (value extraction), protocol handlers, response processing |
lib/techniques/ |
the exploitation engines: blind/inference.py, error/use.py, union/{test,use}.py, dns/ |
lib/parse/ |
parsing of inputs: CLI, config, HTTP request/log files, HTML, sitemap, and the XML payload/boundary loader (payloads.py) |
lib/utils/ |
feature modules: api.py (REST), hashdb.py (session), crawler.py, hash.py (cracking), har.py, brute.py, search.py, ... |
lib/takeover/ |
OS-level takeover: shells, file access, UDF, registry, Metasploit, xp_cmdshell |
plugins/generic/ |
DBMS-agnostic enumeration/fingerprint/filesystem/takeover base classes |
plugins/dbms/<dbms>/ |
per-DBMS subclasses + dialect (one dir per supported DBMS) |
tamper/ |
payload-mutation scripts (WAF bypass), one tamper() per file |
data/xml/ |
the data-driven engine: boundaries.xml, payloads/*.xml, queries.xml, errors.xml |
data/ (other) |
wordlists/common tables/columns (txt/), UDFs (udf/), stored procs (procs/), shells (shell/) |
tests/ |
stdlib-unittest suite (offline); see section 11 |
thirdparty/ |
vendored dependencies (six, bottle, chardet, ...) - no pip at runtime |
extra/ |
auxiliary tools (e.g. vulnserver used by --vuln-test) |
4. The scan lifecycle (lib/controller/controller.py: start())
For each target:
- Target setup -
initTargetEnv()/setupTargetEnv()(lib/core/target.py): resolve URL/params, open the per-target output dir and session file (conf.hashDBFile), and resume anything already known (DBMS, injection points, cached values) from the session. - Connection & profiling (
lib/controller/checks.py):checkConnection(),checkWaf()(fillskb.identifiedWafs),checkStability()/ dynamic-content detection (establisheskb.pageTemplate,kb.matchRatio). - Heuristics -
heuristicCheckSqlInjection()(cheap error-based hint). - Detection -
checkSqlInjection(place, parameter, value)per parameter, driven by the data engine (section 5). Confirmed points are appended tokb.injections. - Fingerprint & handler -
lib/controller/handler.py: setHandler()identifies the back-end DBMS and assignsconf.dbmsHandler, the object through which all enumeration is dispatched (section 7). - Action -
action()(lib/controller/action.py) routes the requested operation (--banner,--dbs,--tables,--dump,--sql-query,--os-shell, ...) toconf.dbmsHandlermethods, and feeds results toconf.dumper.
If nothing is injectable, the dead-end advisory (level/risk, technique, --text-only,
--tamper - definitive when kb.identifiedWafs is set) is raised as
SqlmapNotVulnerableException.
5. The data-driven detection engine
Detection behavior lives in data, not code - data/xml/, loaded by
lib/parse/payloads.py (loadBoundaries(), loadPayloads()):
boundaries.xml- injection boundaries: prefix/suffix pairs and the clause/where/parameter-type context they apply to (e.g. quote vs. numeric contexts).payloads/*.xml- the tests, one file per technique (boolean_blind,error_based,inline_query,stacked_queries,time_blind,union_query), each with the request template and the comparison/grep logic that decides success.
getSortedInjectionTests() (lib/core/common.py) orders the candidate tests by the
identified/likely DBMS, --level, and --risk. The agent (lib/core/agent.py)
forges the actual payload string - applying boundary prefix/suffix, the [RANDNUM]/
[DELIMITER]-style markers, comments, and tamper scripts. Requests go out via
lib/request/connect.py; the oracle lib/request/comparison.py decides true/false
by comparing the response against kb.pageTemplate (difflib ratio vs. kb.matchRatio,
plus titles/errors/HTTP-code signals).
6. Exploitation techniques
Once a parameter is injectable, value extraction is dispatched by
lib/request/inject.py: getValue() to the matching engine in lib/techniques/:
| Technique | Engine | Mechanism |
|---|---|---|
| boolean-based blind | blind/inference.py: bisection() |
binary-search each character via true/false oracle |
| time-based blind / stacked | blind/inference.py (time compare) |
same bisection, oracle is a measured delay |
| error-based | error/use.py: errorUse() |
parse the value straight out of a provoked DB error |
| UNION query | union/{test,use}.py |
column-count detection then UNION SELECT extraction |
| inline query | (inline, via inject) | value embedded in the original query position |
| DNS exfiltration | dns/ |
--dns-domain out-of-band channel |
bisection() is the hot loop; it caches the --charset table in
kb.cache.charsetAsciiTbl and respects the kb.disableShiftTable runaway-guard latch
(intentional). Multi-threaded extraction is coordinated via kb.locks and
getCurrentThreadData() (lib/core/threads.py).
7. DBMS abstraction
Enumeration is DBMS-agnostic at the top and specialized underneath:
plugins/generic/- base classes for each concern:fingerprint.py,enumeration.py,databases.py,entries.py,users.py,filesystem.py,takeover.py,syntax.py,misc.py,search.py,custom.py,connector.py(direct DB connection for-d).plugins/dbms/<dbms>/- one directory per supported DBMS, subclassing the generic pieces and supplying dialect specifics.data/xml/queries.xml- per-DBMS SQL query templates (banner, current user, table enumeration, casting, etc.) keyed by DBMS. The generic code asks for a query by name; the dialect comes from XML.
conf.dbmsHandler (set in handler.py) is the live object that action() calls into.
8. Output and session
- Output -
conf.dumperis aDumpinstance (lib/core/dump.py): console tables plus per-table file export in CSV / HTML / SQLITE / JSONL (--dump-format). Logging is vialogger(lib/core/log.py). - Session / resume - each target gets a SQLite session file
(
<output>/<host>/session.sqlite).hashDBWrite()/hashDBRetrieve()(lib/core/common.py, backed bylib/utils/hashdb.py) cache injection points, fingerprint, and extracted values so a re-run resumes instead of re-testing (--flush-sessiondiscards it;--fresh-queriesignores cached query results). A stale-session nudge fires on resume when the file is older thanHASHDB_STALE_DAYS.
9. Request layer and tampering
lib/request/connect.py (Connect.getPage) is the single HTTP chokepoint. Around it:
protocol handlers (httpshandler, redirecthandler, chunkedhandler, rangehandler,
persistent connections via lib/request/keepalive.py), response processing (basic.py), and the
comparison oracle (comparison.py).
Tamper scripts (tamper/) mutate the payload just before sending to evade WAF/IPS.
Each file exposes a tamper(payload, **kwargs) and a __priority__; --tamper=a,b,c
chains them in priority order. They are payload-string transforms only (no engine
coupling), which is why they compose freely.
10. REST API and JSON report
lib/utils/api.py runs a Bottle server (sqlmapapi.py -s) that drives sqlmap scans as
subprocesses and exposes them over HTTP. Key pieces: DataStore/Task (task registry),
an IPC SQLite Database (the subprocess writes results/logs/errors back through
StdDbOut), and the route handlers (/task/*, /option/*, /scan/*, /version, ...).
The contract is documented in sqlmapapi.yaml (OpenAPI) and REST-API.md.
--report-json reuses the same assembly code (_assembleData / _sanitizeScanData)
that the /scan/<id>/data endpoint uses, so the CLI report and the API result can't
drift; RESTAPI_VERSION is the API contract version (major exposed as integer).
11. Tests and self-tests
Two complementary layers:
- Offline unit/regression suite (
tests/) - stdlibunittestonly (no pytest/pip), green on py2 + py3._testutils.pybootstraps global state and provides the property/fuzz harness (Rng- a cross-version-identical PRNG - andfor_all). Run:python -B -m unittest discover -s tests -p "test_*.py"(-Bmatters: a cached.pycmakes agetFileType(__file__)doctest seebinary). - In-tree self-tests (
lib/core/testing.py, hidden switches):--smoke-test(doctests + regex sanity over the whole tree),--vuln-test(end-to-end scans against the bundledextra/vulnserver),--api-test(live REST round-trip). The CI workflow (.github/workflows/tests.yml) runs all of these.
12. "Where do I start for ...?"
| I want to change... | Start in |
|---|---|
| a CLI option | lib/parse/cmdline.py (+ optiondict.py, defaults.py) |
| a constant/threshold | lib/core/settings.py |
| how injection is detected | data/xml/boundaries.xml + data/xml/payloads/*.xml, then lib/controller/checks.py |
| how a value is extracted | lib/request/inject.py + the relevant lib/techniques/ engine |
| the true/false decision | lib/request/comparison.py |
| a per-DBMS query/dialect | data/xml/queries.xml + plugins/dbms/<dbms>/ |
| enumeration behavior | plugins/generic/*.py |
| dump/output format | lib/core/dump.py |
| a WAF-bypass transform | add a file under tamper/ |
| the REST API surface | lib/utils/api.py (+ keep sqlmapapi.yaml in sync) |
| session/resume behavior | lib/utils/hashdb.py + hashDB* in lib/core/common.py |
| a stdlib monkey-patch / security shim | lib/core/patch.py |