Cybercriminals Stopped Trying to Break AI’s Rules and Started Building Their Own

Researchers at Palo Alto Networks’ Unit 42 have documented what happens when cybercriminals stop attempting to circumvent the guardrails on legitimate AI systems and simply build their own without any. The two underground language models they analyzed are not experimental projects or proof-of-concept demonstrations. They are functional tools, trained on stolen code, leaked datasets, and malware samples, capable of writing convincing phishing content, generating attack scripts, assisting with malware development, and walking inexperienced attackers through operations they could not have executed independently. The researchers’ conclusion is direct: these systems are already part of the active cybercriminal toolkit.

The implications extend well beyond the technical details of how these models were constructed. What Unit 42 documented is a change in the capability distribution of the threat landscape, one that affects the organizations these attackers target, regardless of whether those organizations ever interact with AI tools themselves.

What These Systems Actually Do
The guardrails built into commercial AI systems, the refusals, the content policies, and the safety training exist because the developers of those systems made deliberate choices to constrain what the models would assist with. Those constraints are not inherent to the underlying technology. They are design decisions that can be made differently, and criminals building tools for criminal purposes have made them differently.

The underground models Unit 42 analyzed reflect the absence of constraint directly in their capabilities. Phishing email generation is among the most immediately applicable outputs. Commercial language models produce polished, grammatically correct prose, and so do these systems, except without the restriction against using that capability to impersonate financial institutions, craft urgency designed to override recipient judgment, or personalize content based on specific targets. The AI-generated phishing email that arrives in an employee’s inbox may be more convincing than anything a human attacker would have written, because the model optimizes for persuasiveness without the friction of composing from scratch.

Malware development assistance extends the capability to attackers who would previously have been limited by their technical knowledge. Writing functional malicious code requires specific expertise. Walking someone through the steps required to deploy it requires patience and availability. These systems provide both on demand, without the gatekeeping that makes acquiring that knowledge through legitimate channels slow and visible. An attacker who knows what outcome they want but lacks the technical knowledge to achieve it independently can now close that gap through a conversation with a system designed to help them do exactly that.

Automation of reconnaissance and scripting addresses the labor-intensive parts of attack preparation that previously constrained how many targets an attacker could pursue simultaneously. Volume operations require volume tooling. These systems reduce the per-target effort that attack preparation requires, which means the same attacker can pursue more targets than they could before the tooling existed.

The Shift in Who Can Execute What
The detail that carries the most practical significance for organizations thinking about their threat exposure is what these tools do to the skill requirements for conducting a serious attack.

Cybersecurity has historically operated with an implicit assumption that attack sophistication correlates with attacker capability. Highly sophisticated attacks come from highly capable actors, which means the population of attackers who can threaten any given organization is bounded by the population of people with the relevant technical skills. That assumption has been eroding for years through the commercialization of attack tooling, and the emergence of purpose-built criminal AI models accelerates the erosion significantly.

An attacker who can access one of these systems does not need to know how to write malware. They do not need to know how to construct a phishing email that will survive scrutiny. They do not need to understand the specific vulnerability they are exploiting. They need access to the tool and a description of what they want to accomplish. The model handles the technical translation between intent and execution.

The practical consequence for organizations is that the threat population they face is no longer bounded by the population of technically skilled attackers. It now includes anyone who can pay for or obtain access to these systems, which is a substantially larger and more diverse group. Defenses calibrated to the sophistication level of attacks that required genuine technical expertise to mount need to account for a threat environment where that expertise requirement has been substantially reduced.

Where Existing Defenses Hold and Where They Do Not
The existence of criminal AI models does not make conventional security controls obsolete. It changes the profile of the attacks those controls need to handle, and understanding that distinction matters for assessing where investment is most warranted.

Multi-factor authentication remains among the highest-value controls available, specifically because of what it does when a phishing attack succeeds. AI-generated phishing content is more convincing than human-generated phishing content on average, which means a larger fraction of phishing attempts may succeed in obtaining credentials. MFA does not prevent credential theft. It prevents credential use. An attacker who obtains a username and password through a successful phishing operation and then encounters MFA has not gained the access they sought. The attack succeeded at one stage and failed at the decisive stage. That outcome is worth a great deal.

Email filtering and endpoint security that incorporate behavioral detection address a gap that signature-based detection leaves open. AI-generated malware and AI-generated phishing content can be varied continuously in ways that defeat detection based on known patterns. Behavioral detection, which identifies what code or content does rather than what it looks like, is more resilient against variation because the underlying behavior is harder to vary while preserving the attack’s effectiveness.

Software update discipline becomes more urgent in an environment where vulnerability scanning and exploitation can be automated at scale. Known vulnerabilities with available patches represent an exploitable population that automated tooling can identify and target systematically. The window between a vulnerability becoming known and patches being applied is the window during which automated exploitation is most effective. Shortening that window consistently across the organization’s systems reduces exposure to the highest-volume attack category these tools enable.

Phishing awareness training faces a specific challenge from AI-generated content that training programs need to address directly. The traditional signals employees are trained to recognize, grammatical errors, awkward phrasing, and generic salutations, are the signals that AI-generated phishing content has been specifically optimized to eliminate. Training that focuses on those signals without accounting for the current quality of AI-generated content is preparing employees for the last generation of attack rather than the current one. The relevant update to phishing training is teaching employees to verify unexpected requests through channels independent of the message itself, regardless of how legitimate the message appears, because appearance is no longer a reliable signal.

The Structural Problem These Tools Represent
Unit 42’s findings document a specific set of tools, but the more significant finding is what those tools represent about the trajectory of criminal AI development. The resources and expertise required to build a functional language model, even one trained on stolen data and oriented toward malicious use, are not trivial. Criminal actors invested those resources because the return on that investment is clear. Tooling that reduces the skill requirement for attack execution while increasing the quality and volume of attacks is valuable to anyone in the business of conducting attacks.

That investment will continue. The two models Unit 42 analyzed will not be the last. They will be followed by more capable versions, by competing tools from other criminal developers, and by distribution through criminal marketplaces that make access easier for a wider range of actors. The trend line runs in one direction, and the organizations that need to defend against the attacks these tools enable do not have the option of waiting for the trend to reverse.

The appropriate response is not to be alarmed at a threat that has somehow emerged without warning. Security researchers have been tracking the development of criminal AI tooling, and the Unit 42 findings are a documented confirmation of a trajectory that was already visible. The appropriate response is the same one that applies to supply chain risk, to AI browser vulnerabilities, to any threat category that has crossed the line from theoretical to operational: treat it as a current condition that requires current defenses, apply the controls that reduce exposure in proportion to the risk they address, and build the awareness within the organization that makes employees an asset in threat detection rather than the primary attack surface.

Criminals built an AI that does what they need it to do. The organizations they are targeting now need defenses that account for what that capability actually means.