Not long ago, many companies restricted access to generative artificial intelligence (AI) chatbots at work, wary of security and productivity risks. Today, CISOs are optimistic that agentic AI can automate workflows and solve cybersecurity skills gaps.
While defenders experiment with AI to speed up operations, cyber attackers have been quick to adopt AI to enhance their own tradecraft, although it remains to be seen which group AI benefits most. Easy access to AI tools has improved the quality of Business Email Compromise (BEC) lures, deepfakes, and vishing, yet, as our intelligence analysts have observed, AI has delivered efficiency gains with limited operational benefits. Campaigns still largely rely on proven phishing kits and PhaaS platforms that are cheap, protect anonymity, and easy to use. (See Intel 471’s white paper Precision Deception: Rise of artificial intelligence-powered social engineering).
Still, AI-enhanced threats are evolving. On August 27, AI chatbot developer Anthropic reported a threat actor had used its agentic coding tool Claude Code in 17 data extortion attacks in three months. The actor allegedly used the tool to scan for vulnerable VPN endpoints, gain initial access, create custom malware, harvest credentials, exfiltrate data, and analyze stolen financial data to determine the ransom amount.
But using AI to support each task in an attack is not the same as AI enabling and automating the entire attack. SANS Fellow Rob Lee projects a future where CEOs deploy AI without proper security and are overwhelmed by threats that use AI across the entire attack chain. Organizations rush to hire AI threat intel analysts capable of converting new attack patterns into executable threat hunts in seconds — a feat that today takes weeks before executing a hunt and handing off findings to incident response.
To help threat hunters use AI effectively and stay in control of critical security decisions, our world-class threat hunters outline what AI brings to the table today, what it doesn’t, how to use it, and possible pitfalls.
How AI can help your threat hunting teams
Our threat hunters used the “Targeted Hunting integrating Threat Intelligence” or “TaHiTI” framework to assess where AI is most useful in threat hunting.
TaHiTI, created by financial sector members of the Dutch Payments Association, emphasizes structured threat hunting, in which hunters have a clear idea of what they’re looking for before beginning a hunt for undetected threats. Our threat hunters prefer TaHiTI because it is a vendor neutral methodology and flexible enough to cater for complexity.

The three phases of TaHiTI are:
Initiate — Proactive and reactive hunt triggers include new intelligence about a threat actor’s tactics, techniques, and procedures (TTPs), other hunts, security monitoring alerts, incident response and red team activities, and MITRE ATT&CK TTPs.
Hunt — Hunt hypotheses are defined, refined, and enriched with contextual threat intelligence along the way. Data sources and analysis techniques, such as querying or clustering, are defined. After executing a hunt query, the returned data is analyzed, and the hypothesis is refined with additional queries and exclusions before validation.
Finalize — Hunt results and conclusions are documented in tactical, operational, and strategic reporting for prioritization of future hunts and to enhance repeatability. Reports include recommended mitigations to improve preventative measures, logging, and security monitoring, and processes such as vulnerability management, or configuration management. New hunt outputs are handed off to detection engineering, incident response, security monitoring, threat intelligence, vulnerability management, and others.
Intel 471 Threat Hunt Analysts’ Assessment
Note: This is a non-exhaustive analysis of machine learning (ML) toolkits, AI, and large language models (LLMs) applied to threat hunting scenarios. It also includes security tools that integrate AI out of the box. We selected ChatGPT as a reference model because it is what our threat hunters have the most experience in prompting and evaluating while developing behavioral hunts. The analysis does not aim to compare the strengths and weaknesses of different LLMs.
AI in the loop: Evaluating AI and ML for human-led threat hunting

Hypothesis Development
AI can’t yet develop a complete hypothesis but is useful as a sounding board that puts “AI in the loop” of human thought processes during hypothesis development. The team often uses AI to stress-test their hypotheses to help identify gaps in logic or when they should investigate artifacts associated with a particular technique or tool.
While AI can help counter tunnel vision during the hunt, models that have been designed to be agreeable can subtly lead the threat hunter astray by validating a flawed hypothesis about tools and techniques that aren’t well documented.
To avoid the pitfalls of AI sycophancy, Intel 471 Senior Threat Hunt Analyst Scott Poley has found it’s best to ask questions that are constrained by facts.
“ChatGPT puts on a positive tone so if you provide a concept that hasn't been validated before, it always seems to support the idea, reinforcing confirmation bias,” explains Poley. “But when pitching a hypothesis to it, it does a better job being objective when you describe something and validate facts first.”
“Most people know how important the prompt is to get an appropriate and accurate answer. A way to prevent ChatGPT from taking me down a rabbit hole — which it has done to me countless times — is to take a more structured fact-finding approach before exploring the hunt hypothesis.”
One example is researching “reflective loading,” a technique used for executing a malicious payload in memory without writing the file to disk, hiding it from file scanning tools.
“If I was interested in reflective execution, which is common in Linux but can be leveraged in Windows, I might start by providing an example, like, I see this command 'cmd /c < text.file',” explains Poley.
He’d then ask:
- What is it doing?
- Are there examples of this documented in general?
- Do programs or apps use this for functionality?
- What are the pros and cons of doing this?
- Can you do this with other script interpreting executables in Windows?
In Poley’s experience, ChatGPT accurately responds to these fact-finding questions. (The correct answer to the last question is that cmd, aka Command Prompt, is the only way of doing it on Windows, he notes.)
But when asking pointed questions to ChatGPT, like “How do I hunt for X behavior for a given command?” he’s found it tends to validate some points and provide examples for other ways to hunt that are similar but wrong.
Data Scoping and Query Design
Using AI for query design and scoping has delivered mixed results for our threat hunters. When navigating a thought process behind a possible query, threat hunters often seek to understand the platform’s query language for filtering and pivoting in large datasets.
The models often fail to create useful queries due to gaps in its understanding or training on a resource that doesn't exist anymore. However, AI tools can quickly point users toward helpful documentation and resources for a query language, saving time otherwise spent scrolling through Google.
Threat Detection
AI is often touted for automating the creation of new threat detections, including for zero-day threats and fileless malware. Our threat hunters are skeptical of out-of-the-box AI solutions due to high false positives and the lack of control they offer practitioners.
“I get hives when I see claims about out-of-the-box AI threat detection content,” says Intel 471 Threat Hunt Analyst Thomas Kostura. “The variances in any given environment are there whether we want them or not. I'm excited about the possibilities of AI threat detection because we can help produce more detections faster. But I've also seen EDR detection tools, which are not necessarily AI but AI-driven, that generate too many false positives or detections that don’t make sense.”
Other industry experts have drawn similar conclusions about LLM-generated detection rules. Rather than relying on AI to lead incident response or trigger a hunt, a potentially more useful application of AI is helping threat hunters and detection engineers understand the telemetry that AI-based detection systems use to detect a novel threat.
“AI-based detection tools are interesting because they depend heavily on telemetry,” says Poley. “If you say you can detect zero days or fileless malware with AI, I think it's better to use AI to figure out what telemetry the solution uses to detect that. This way you can build detections and have a better understanding and more control over that.”
Knowing the variables inside of an environment, sector, and geography also matter. A threat that targets an organization in the finance sector will have different relevance and impact in retail or pharmaceutical. The contextual awareness from an out-of-the-box AI offering versus an AI solution molded to the environment result in very different pathways to head off a threat.
However, our threat hunters believe Retrieval-Augmented Generation (RAG) is a promising method for tailoring a pre-trained LLM to an organization’s specific policies, procedures, and data. RAG can improve the relevance of responses by ensuring the LLM references a knowledge source external to its training data before generating responses that are grounded in environment-specific data.
AI risks in threat detection:
- Lack of contextual awareness leads to high false positives.
- Poor log retention hinders detection of advanced techniques.
- Over-reliance on automation resulting in poor mitigation choices.
Data Analysis
A key inhibitor for using EDR tools for threat and anomaly detection is insufficient logging retention. Intel 471 Senior Threat Hunt Analyst Lee Archinal sees this in customer engagements when helping threat hunting teams troubleshoot and improve contextual enrichment of hunts in their environments.
“Some EDRs don't go past 30, 60, 90 days, and if you're trying to train on data with a 30-day trend, that may not be enough to confidently identify anomalies. This can lead to high false positives. For example, if I only open up Word once a month, wouldn't that be considered an anomaly? EDRs need a lot of environmental tuning,” says Archinal.
“But if you're talking about SIEMs, the length of logging retention depends on your license, and you can start looking at larger sets of data using machine learning. Machine learning toolkits for major SIEMs do flag anomalous data, but they also flag data that is not anomalous. That's where the human part comes into play. Even though machine learning shows you results in the dataset, you still need to ask about results that were not flagged.”
When pursuing an adversary, it’s worth considering the human on the other end. Whether it's a human, asset, or internal activity, there’s usually a core behavior or responsibility involved. It’s the threat hunter’s role to build perspective and context while analyzing datasets returned from hunt queries.
SIEM tools configured with detailed logging and best practice data retention are critical for data aggregation and analysis. A baseline of normal activity and detailed logging of host and network devices for evidence of authentication attempts, command-line executions and arguments enable threat hunters to analyze historical activity for behaviors, anomalies, and advanced TTPs, such as Living off the Land (LOtL) tactics and binaries (LOLbins).
Predictive Analytics and Malware
AI is strong in predictive analytics when applied to large-scale network data. Security controls with predictive analytics for malware and endpoint detection also already do a good job. Our threat hunters see limited potential for AI to bring anything new to malware detection with the exception of threats that are hard to build detections for.
“I don't see AI as filling that big of a gap in malware detection. There's been so much focus in this space already and so many tools that do a pretty good job. And these tools are mostly not getting defeated by malware — they're getting defeated by things you would never classify as malware to begin with, such as behaviors using LOLbins. This is why we threat hunt versus just relying on detections alone,” says Poley.
Enrichment
After running an initial hunt, our threat hunters always want to understand the results to determine if they’ve found something. Executing a hunt typically returns large data sets. At this point, threat hunters often ask themselves, ‘What do I do next?’ The answer usually involves building further pivots and exclusions into queries during the analysis of initial results to reduce noise and identify relevant artifacts.
Once a hypothesis is formed, it’s not uncommon to focus too narrowly on one path or indicator. AI helps counter tunnel vision by providing broader context and external intelligence that may not be immediately obvious.
We’ve found AI useful for suggesting threat actor groups that use similar techniques and for identifying relevant MITRE ATT&CK techniques and subtechniques that align with or expand a hypothesis. It’s also been helpful for identifying gaps in logic, giving us an avenue to consider overlooked angles. It can add depth to the hunt and make it more generalized.
The team often uses AI to enrich hunts with CTI context when tagging Intel 471 HUNTER hunt packages with threat actors that they observed using TTPs documented in another hunt package and mapping TTPs to MITRE techniques, along with other contextual data such as the industry and geography targeted, severities, and motivations. This intelligence and context helps teams prioritize hunts.
Our hunt team looks for ways AI can streamline enrichment to help make human analysts informed decision-makers rather than merely performing routine tasks.
AI wins during enrichment:
- Expanding context around observed activity.
- Finding related actor groups or MITRE techniques.
- Preventing tunnel vision during investigations.
- Discovering adjacent hunting paths.
Reporting and Documentation
“No one likes to write documentation, but the better your documentation, the more valuable your work becomes to the organization,” says Poley.
Threat hunters need to summarize their work, and explain the importance of findings on undetected threats, misconfigurations, visibility gaps, and their impacts to organizational risk. They also need to communicate these to different audiences at different technical levels. Generative AI can help reduce the load of documentation and reporting for both the writer and reader.
Consistent reporting structure with an executive summary helps leadership know what to expect every time. AI can help here, saving time writing the report in a consistent style and structure, which helps reduce cognitive load for the stakeholders who need to make quick and informed decisions about risk.
AI ‘wins’ in reporting:
- Consistent, structured reports with executive summaries.
- Reports tailored for different audiences.
- Clear communications for both technical and leadership readers to improve rapid decision making.
Threat Hunting Automation and Hunt Playbooks
Hunts should be repeatable and run iteratively. AI can offer value in analyzing hunts retrospectively, such as comparing current and past hunt results to spot differences and trends, which can inform an AI’s suggested exclusions and enrichments to reduce noise and false positives. It can also help generate stronger structure for testable hypotheses and reporting automation.
“I think that retrospective view is valuable,” says Poley. “Let’s take a hunt that worked. If I run that hunt for a day, a week, or 30 days, you'll be able to view the results in time slices like that and run an analysis over the top to identify things that stood out based on the results in these time frames."
AI can also enhance a threat hunting playbook by learning from past incidents. AI can, for example, analyze data and add valuable context such as, “this activity only occurred once in the last 90 days,” helping analysts make more informed decisions. This enrichment enables better prioritization by providing insights into frequency, relevance, and historical patterns.
Over time, this data can help train systems to suggest next steps about what human analysts did in similar situations in the past.
Hunt outputs: Incident Response and Vulnerability Management
Automating incident response and validation is a challenge because the AI requires good training to gain contextual knowledge of the environment.
Over-reliance on out-of-the-box AI can lead to over automation and poor outcomes in high-risk situations, such as automatically locking down a system. It would be better for automation to reflect human decision-making patterns and past actions taken.
AI’s effectiveness in vulnerability management again depends heavily on context within the organization and the external threat landscape. Defenders need to know their technology stack, what is vulnerable, how critical vulnerabilities are, and what assets need remediating. Prioritization also depends on how vulnerability management feeds into other processes, such as threat intelligence about CVE weaponization and whether engineering has detections or mitigations in place.
AI is useful for assessing the severity of a vulnerability and can help non-technical users better understand the risks to support prioritization. However, we’ve found it limited when making risk assessments or recommendations. AI may overestimate the risk of a vulnerability or make inaccurate risk assessments by failing to recognize the affected asset is protected from the internet by proper network segmentation, multiple control devices, and firewalls that are configured to protect open ports.
AI wins in vulnerability management:
- Assessing vulnerability criticality relative to environment exposure.
- Prioritizing based on risk factors like network segmentation or asset role.
- Bridging gaps between technical teams and leadership by translating vulnerabilities into actionable insights.
Conclusion
It’s still early days for AI in the enterprise, but the pace of progress and growing adoption of AI in business processes make it highly likely that threat hunters will need to integrate AI into their threat hunting methodology. As AI is integrated into more business processes, it will also become important for threat hunters to understand how to leverage the telemetry and data that AI generates to build future hunts.
Our threat hunters believe TaHiTI is an ideal methodology for intelligence-driven threat hunting, which sufficiently caters to the complexity of modern enterprise IT environments spanning on-premise and cloud.
With AI tools significantly expanding the attack surface, threat hunters will likely need to harness AI to rapidly separate actionable intelligence from contextual noise. As the roles of threat intelligence analysts and threat hunt analysts increasingly converge, AI could play an invaluable role in identifying overlaps and connections to data that organizations already have in their security and data platforms. This combination of detailed logs and AI analysis has the potential to make vastly improved intelligence products that enable fast and reliable predictive decision making that drives better technical outcomes.