Basilisk: An Evolutionary AI Red-Teaming Framework for Systematic Security Evaluation of Large Language Models

Regaan

doi:10.5281/zenodo.18909538

Smart Prompt Evolution (SPE-NL)

SPE-NL is Basilisk's genetic algorithm engine that evolves attack payloads across generations to discover bypasses that static tools miss.

How It Works

[Population Init] → [Fitness Evaluation] → [Selection] → [Mutation + Crossover] → [Next Generation]
       ↑                                                                                    |
       └────────────────────── Repeat until breakthrough or stagnation ─────────────────────┘

1. Population Initialization

Each generation starts with a population of candidate payloads (default: 100). The initial population is seeded from the module's built-in payload database plus random mutations.

2. Fitness Evaluation

Each payload is sent to the target LLM and the response is scored across multiple signals:

Refusal Avoidance — Did the model comply vs refuse?
Information Leakage — Did the response contain sensitive data?
Compliance Score — How helpful was the response to the attack?
Novelty Reward — Bonus for unique payload structures

3. Selection

Top-performing payloads are selected as parents using tournament selection. Elite payloads (top 5%) are carried forward unchanged.

4. Mutation Operators

15 mutation operators transform payloads (11 base + 4 multi-turn aware):

| Operator | Description | |----------|-------------| | Synonym Swap | Replace words with synonyms | | Encoding Wrap | Wrap in base64/hex/rot13 | | Role Injection | Prepend persona/role instructions | | Language Shift | Translate segment to another language | | Structure Overhaul | Completely restructure the prompt | | Fragment Split | Split across multiple messages | | Nesting | Nest instructions inside benign context | | Homoglyphs | Replace characters with visually similar Unicode | | Context Padding | Add benign padding to evade pattern matching | | Token Smuggling | Use token boundary tricks to hide payloads | | Delimiter Wrap | Wrap payload in delimiter-style framing | | Role Assumption | Prefix with role-assumption context ("As your admin...") | | Temporal Anchor | Frame as historical/pre-training behavior | | Nested Context | Embed inside summarization/translation meta-task | | Authority Tone | Escalate with authority markers and override directives |

5. Crossover Strategies

5 crossover strategies combine successful payloads:

| Strategy | Description | |----------|-------------| | Single-Point | Split at one point, swap tails | | Uniform | Mix tokens from both parents uniformly | | Prefix-Suffix | Take prefix from parent A, suffix from parent B | | Semantic Blend | Blend the semantic intent of both payloads | | Best-of-Both | Take strongest components from each parent |

Configuration

# 10 generations (recommended for deep testing)
basilisk scan -t URL --generations 10

# Custom population size
basilisk scan -t URL --population 200

# Standard mode (5 generations, balanced speed/depth)
basilisk scan -t URL --mode standard

# Deep mode (10 generations, maximum discovery)
basilisk scan -t URL --mode deep

Stagnation Detection

If the average fitness doesn't improve for 3 consecutive generations, SPE-NL triggers an "extinction event" — replacing 80% of the population with fresh random mutations to escape local optima. The mutation rate also adapts dynamically: increasing when diversity drops and decreasing after breakthroughs.

Population Diversity Tracking

The evolution engine continuously monitors population diversity using Jaccard distance sampling across candidate payloads. This prevents convergence collapse — where all candidates become too similar, making the search ineffective. When diversity drops below the threshold, the engine injects fresh random mutations to maintain exploration.

Key metrics tracked per generation:

| Metric | Description | |--------|-------------| | Mean Fitness | Average fitness score across the population | | Diversity Score | Jaccard distance between sampled payload pairs (0-1) | | Stagnation Counter | Consecutive generations without improvement | | Mutation Rate | Current adaptive mutation probability |