AI Search Visibility: Why Tracking Rankings No Longer Works

For years, SEO professionals have relied on a familiar concept: rankings. You track positions, monitor fluctuations, and optimise accordingly. It was never perfect, but it was predictable enough to guide strategy.

That model is now breaking down.

Recent research led by Rand Fishkin highlights a growing issue that many in the industry are only beginning to understand: tracking brand visibility in AI-generated responses does not behave like traditional search at all. In fact, often, it barely behaves like a measurable system.

And yet, businesses are already trying to build dashboards around it.

The Illusion of “Rankings” in AI Responses

At first glance, it seems logical to apply classic SEO thinking to AI search. If a chatbot like ChatGPT or Claude generates a list of recommended brands, surely those brands have some form of ranking, right?

Not quite.

Fishkin’s experiment involved 600 participants submitting identical queries across multiple AI systems, including Google’s AI-powered search features. The expectation might have been some level of consistency, perhaps slight variations depending on the platform.

Instead, the results showed near-total fragmentation.

Two responses to the same query rarely matched. Not just in order, but also which brands appeared at all. Lists varied in length, composition, and structure. In statistical terms, the probability of receiving two similar outputs was extremely low, close to negligible.

This is not a bug. It is a feature of how these systems work.

Why AI Search Is Fundamentally Different

To understand why tracking fails, you need to understand what AI systems are optimising for.

Traditional search engines retrieve and rank documents. AI systems generate answers.

That difference changes everything.

AI models:

  • Do not return fixed result sets
  • Do not operate on static rankings
  • Do not produce identical outputs for identical inputs

Instead, they generate responses based on probabilities, context, training data, and, increasingly, personalisation signals. Even subtle variations in phrasing, timing, or user context can produce entirely different outputs.

And that means there is no stable “position” to track.

The Personalisation Problem Nobody Talks About

One of the most underappreciated dimensions of AI search is personalisation.

Users interacting with AI systems are not simply entering search terms. Each session is a conversation, and conversations carry weight. Prior exchanges shape what comes next, inferred preferences narrow the response space, and the output shifts accordingly.

This creates a significant gap between:

  • What an API returns in a controlled test environment
  • What a real user sees in a live interaction

Most tracking tools depend on API queries. The problem is that APIs bear little resemblance to real user behaviour. There is no accumulated history, no inferred context, and no personalisation layer influencing what gets generated.

So even if repeated queries produced consistent results, you would still be measuring a simplified version of reality, one scrubbed of the conditions that actually determine what individual users receive.

And that simplification can be misleading.

A More Reliable Approach: Measuring Probabilistic Visibility

Despite the chaos, Fishkin’s research does point toward a more useful methodology.

Instead of trying to track a single query, the approach shifts toward aggregation and probability.

When the same prompt is submitted repeatedly, patterns begin to emerge. Certain brands appear more frequently than others. Not always in the same position, but with consistent presence.

This allows for a different kind of metric:
Not ranking, but frequency of mention.

For example, if a brand appears in 60 out of 100 generated responses for a given query, you can infer a level of visibility within that context.

It’s not precise. It’s not deterministic. But it is directional.

And in AI search, direction may be the most reliable signal available.

Expanding the Model: Topic-Level Visibility

The methodology becomes more powerful when expanded beyond a single query.

One prompt tells you almost nothing on its own. The more useful approach is to analyse clusters of prompts across a topic, which is also how users actually interact with AI systems. Nobody asks the same question verbatim a hundred times. They ask variations, come at the same subject from different angles, and rephrase based on what the previous answer gave them.

When you apply this kind of broad prompt analysis at scale, a clear pattern emerges. Some brands show up consistently across multiple subtopics, regardless of how the question is worded. Others surface occasionally and then disappear, present in some phrasings and absent in others. That inconsistency points to something real: weaker topical authority, thinner coverage, less reliable representation in the training data or retrieval systems behind the response.

In some cases, dominant brands achieve visibility rates between 50% and 70% across a wide range of prompts. That is not a ranking in any traditional sense. It is closer to topical dominance, where the brand has become the default reference point for a subject area and AI systems reflect that consistently.

What This Means for SEO Strategy

The implications are significant.

Single-keyword optimisation starts to matter less. In AI environments, visibility is not tied to exact phrasing. It is built on semantic coverage and authority across a topic. A page that ranks for one specific query is not the same as a brand that appears consistently across dozens of variations on a theme, and the latter is what AI systems tend to surface.

Brand strength becomes a critical factor for the same reason. AI systems gravitate toward entities that are well-established, widely referenced, and contextually relevant across a subject area. This is not separate from broader search trends. It is an accelerated version of them, where the signals that have been growing in importance for years now carry even more weight.

Traditional tracking tools need to evolve to keep pace with this. Measuring success in AI search requires:

  • Large-scale prompt analysis
  • Cross-platform comparisons
  • Frequency-based visibility metrics

Anything less risks producing incomplete or misleading insights, not because the data is wrong but because it is answering a question that AI search has already made obsolete.

The Strategic Shift: From Rankings to Presence

What we are witnessing is not just a technical change but a conceptual one.

SEO has long been built around the idea of positions. First page, top three, number one. These benchmarks shaped how success was defined and measured.

AI search disrupts that model.

There are no fixed positions. No stable rankings. Only varying degrees of presence within generated responses.

This shifts the focus from

  • “Where do we rank?”
    to
  • “How often are we included?”

It is a subtle difference, but it changes how strategies are built.

Final Thoughts: Embracing Uncertainty in AI Search

Fishkin’s study does not provide a perfect solution, and perhaps that is the point.

AI search is inherently probabilistic. It resists rigid measurement frameworks. Attempts to force it into traditional SEO models will likely fall short.

However, by embracing aggregation, topic-level analysis, and frequency-based metrics, it becomes possible to extract meaningful insights.

Not precise answers, but useful ones.

And in a landscape where even identical queries produce different results, that may be the most realistic goal.

SEO/GEO Strategy Support

If your current SEO strategy still relies on keyword rankings alone, it is time to expand your measurement framework.

Start exploring:

  • Topic-level visibility
  • Brand mention frequency in AI responses
  • Cross-platform presence

Because in 2026, success in search is no longer just about ranking. It is about being part of the answer.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top