Red teaming, prompt injection testing, and security assessments for AI products. We test your system prompts, content filters, and safety guardrails against real attack techniques. Then we document what we find and help you fix it.
Claude Code's system prompt is not validated for content integrity. A local MITM proxy replaces safety policies, refusal instructions, and behavioral guidelines with attacker-controlled profiles. The API accepts the modified prompt identically to the original.
210 runs across 7 harm categories. Default refusal: 100%. With injected profiles: 9.5%. Every prompt bypassed at least once. 15 of 21 achieved clean 5/5 compliance.
A single non-Latin Unicode character anywhere in the input causes the language detection layer to skip all toxicity checks. Zero-width spaces, Cyrillic homoglyphs, and mixed script all bypass with 100% reliability. The core detection engine is solid — the language gate in front of it is the weak point.
Two rounds of testing, two patches deployed. Developer pushed fixes to production same day both times.
Competitive AI red teaming challenge run by TRAILS and MATS with NSF funding. 40,000+ participants. Demonstrated PII exfiltration, authorization bypasses, and security constraint circumvention against frontier LLMs presumed hardened.
Vulnerability report with reproduction steps. Root cause analysis. Remediation recommendations. Retesting after patch. Public or private writeup — your call.
Independent AI security researcher. Published research on trusted channel injection in Claude Code. Built CCORAL (system prompt injection PoC), CDP-MCP (browser automation via Chrome DevTools Protocol). No CS degree. Started in retail, found a vulnerability in Claude Code, and documented it at publication quality.
We work with a small network of experienced bug bounty hunters and prompt injection specialists. Engagements scale to the scope.
Tell us what needs testing. We'll scope it and get back to you.
or email directly: cassius@redcore.zip