Red Team AI with
Genetic Evolution
29 attack modules. OWASP LLM Top 10 coverage. A genetic algorithm that breeds prompt payloads across generations to find bypasses no static tool can.
$ basilisk scan -t https://api.target.com/chat
[*] Basilisk v0.1.0 — AI Red Teaming Framework
[*] Recon: Fingerprinting target model...
[+] Model: GPT-4 (OpenAI)
[*] Guardrails: 6/8 categories blocked
[*] SPE-NL: Evolving — Gen 1/5...
[!] System prompt extracted via role confusion
[!] Guardrail bypass via encoding + roleplay
[+] Gen 3: Breakthrough — novel injection
[+] 7 findings (2 Critical, 3 High, 2 Medium)
Offensive Capabilities
Built from the ground up for AI/LLM security testing.
Smart Prompt Evolution (SPE-NL)
Genetic algorithm that evolves attack payloads across generations. 10 mutation operators, 5 crossover strategies, and multi-signal fitness evaluation breed deadlier prompts every generation.
29 Attack Modules
Full OWASP LLM Top 10 coverage: prompt injection, system extraction, data exfiltration, tool abuse, guardrail bypass, DoS, multi-turn manipulation, and RAG attacks.
5-Module Reconnaissance
Fingerprint target models (GPT-4, Claude, Gemini, Llama), profile guardrails across 8 content categories, discover tools, measure context windows, and detect RAG pipelines.
OWASP LLM Top 10 Mapping
Every attack module maps directly to OWASP LLM categories: LLM01 (Injection), LLM03 (Training Data), LLM04 (DoS), LLM06 (Sensitive Info), LLM07/08 (Tool Abuse).
Universal Provider Support
Test any LLM via LiteLLM: OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, vLLM. Plus custom HTTP REST and WebSocket adapters for proprietary endpoints.
5 Report Formats
HTML with dark theme and conversation replay, SARIF 2.1.0 for CI/CD, JSON for automation, Markdown for docs, and PDF for client deliverables.
Electron Desktop App
Enterprise-grade GUI with real-time scan visualization, module browser, session replay, and one-click report export. Cross-platform: Windows, macOS, Linux.
CI/CD Integration
Drop Basilisk into GitHub Actions or GitLab CI with SARIF output. Fail pipelines on critical findings to shift AI security left in development.
Native C/Go Extensions
Performance-critical operations in compiled native code: fast payload encoding in C, concurrent fuzzing in Go, parallel pattern matching for high-throughput scanning.
See Basilisk In Action
Real attack scenarios demonstrating how Basilisk discovers AI vulnerabilities.
$ basilisk scan -t https://api.target.com/chat -p openai --mode quick
[*] Basilisk v0.1.0 — Quick Scan Mode
[*] Recon: Fingerprinting target...
[+] Model: GPT-4 | Context: 128K tokens
[*] Running top 50 payloads per module (no evolution)
[!] CRITICAL: Direct prompt injection succeeded
[!] HIGH: System prompt extracted via translation trick
[+] Scan complete in 47s. 4 findings.Up and running in 30 seconds
Install via pip, set your API key, scan. That's it.
$ pip install basilisk-ai
Installing... done ✓
$ export OPENAI_API_KEY="sk-..."
$ basilisk scan -t https://api.target.com/chat
[+] 7 findings (2 Critical, 3 High, 2 Medium)
Frequently Asked Questions
Common questions about using Basilisk for AI security testing.
Why You Need Automated AI Security
Large Language Models power critical applications — from customer service chatbots to financial advisors and healthcare assistants. These AI systems are vulnerable to prompt injection, system prompt extraction, data exfiltration, and guardrail bypass attacks.
Basilisk is the first open-source AI red teaming framework that uses genetic prompt evolution to discover these vulnerabilities. Unlike static testing tools, Basilisk's SPE-NL engine mutates attack payloads based on how the target responds — evolving increasingly effective attacks across generations.
Built for Security Professionals
Whether you're a penetration tester, bug bounty hunter, AI engineer, or security team lead, Basilisk integrates into your existing workflow. Export findings as SARIF for CI/CD, HTML for stakeholders, or JSON for automation.