When it comes to non-indexing on AI platforms, many imagine a problem related to content, keywords or meta tags. Actually, on many occasions to unlock ChatGPT and Gemini and allow them to crawl web pages, you have to go deeper. “Bots like GPTBot and Google-Extended (used by Google’s AI systems, including Gemini and models for training) and sometimes Googlebot itself can in fact be blocked by the server before they can even read a single line of your website.
This guide takes you step by step through the process, from identifying symptoms to testing to perform, from the most common causes to the actual case that has happened to me in the last few days. A sneaky problem, difficult to identify, but brilliantly solved with a little help from ChatGPT and above all thanks to the technical intervention of the provider, in my case Serverplan.
After reading this article, you will know how to understand whether or not AI bots are reading your site, why they can be blocked even if Google and users see it without problems, what tests to run to diagnose the problem, what to ask technical support to solve the situation and how to verify that everything is really working.
How to tell if AI bots aren’t seeing your pages
Before talking about firewalls, DNS, or migrations, we need to understand one key thing. AI bots are not “normal users.” They don’t use Chrome, they don’t accept cookies, they don’t run JavaScript. They make HTTP requests as well, with specific user-agents, and receive a response from the server. If that answer isn’t what you expect, the AI can never read or index the page.
Here are the most frequent symptoms:
- ChatGPT non riesce a leggere la pagina anche se è pubblica e visibile dal browser
- AIs respond using old or outdated sources from your site
- GPTBot user-agent tools return timeout, 403, or blank page errors
- Some content is never seen by models, even after days
For this reason, early testing is essential. They allow you to see your site exactly as bots see it.
Testing the page like a normal user
This is the basic check, which you can do from the command prompt. It is used to understand if the page works correctly for all human users.
curl -I https://tuodominio.it/pagina/
If you get:
HTTP/1.1 200 OK
it means that the server responds correctly, there are no PHP errors, strange redirects, loops or missing permissions, the page is accessible to anyone browsing from the browser.
This test allows you to rule out macro issues. Before we think about AI bots, we need to be sure that the page is reachable for anyone. If a real user sees the page but the bot doesn’t, then the cause is the firewall or a lower security level. This is a preliminary check. If it fails, the problem is not AI; If it works, let’s move on to the next step.
Testing the page as GPTBot and Google-Extended
This is the most important test, because it simulates exactly how the AIs try to read your page. GPTBot and Google-Extended are not real users, they do not load JavaScript, do not accept cookies and do not bypass firewalls. They send a simple HTTP request and wait for a response. If the server blocks them, the AI will never know what’s on your page.
To check if the ChatGPT bot can access your page, enter the following string in the command prompt:
curl -I -A "GPTBot" https://tuodominio.it/pagina/
The requests used by Google’s AI systems (Gemini) come through Googlebot or through Google-Extended. To simulate a Gemini request, use this command:
curl -I -A "Google-Extended" https://tuodominio.it/pagina/
If with one of the two bots you get:
HTTP/1.1 200 OK
It means that, at least on the surface, the bot can access your page.
Otherwise, if you see:
- 403 Forbidden
- 503 Service Unavailable
- Timeout
- Connection reset
it means that the server is blocking that bot or that type of request.
If the page doesn’t respond to GPTBot or Google-Extended, neither ChatGPT nor Gemini can read it or use it as a source. This means that the website is online and visible to all human users, so traditional SEO and Google work. However, AI bots are rejected by the firewall, WAF, or a server rule, so the AIs fail to index. And it is precisely this difference, invisible from the browser, that sends many marketers into crisis: everything seems to work, but AIs see nothing.
Download the page as AI really sees it
The third test is crucial. You don’t just need to know if the bot can get in. I want you to see the page exactly as ChatGPT or Gemini see it.
Start with the command for ChatGPT:
curl -L -A "GPTBot" https://tuodominio.it/pagina/ -o pagina_gpt.html
then move on to the one for Google Gemini:
curl -L -A "Google-Extended" https://tuodominio.it/pagina/ -o pagina_gemini.html
These commands download the entire HTML page, follow any redirects, mimic the AI’s request, and save everything in an html file that you can open.
Now open the two generated pages: pagina_gpt.html and pagina_gemini.html. In the first scenario, the text is present. This is the best case. It means that:
- the server responds correctly to bots;
- GPTBot and Google-Extended manage to receive the full HTML;
- AIs can then read, index, and cite content.
In other words, from an AI perspective, your page is perfectly transparent.
In the second scenario, the file is empty or contains only minimal lines. This is the ultimate proof that AI sees your site as a blank sheet of paper.
The most common causes are:
- the content is generated only via JavaScript and is not available in server-side HTML;
- A cookie banner in autoblocking mode causes bots to block page loading.
- a plugin makes content appear only on the client side (and bots do not execute scripts);
- a firewall or WAF returns a “neutral” page instead of a formal error.
In summary, the page exists but bots can’t get the content. If this happens, ChatGPT, Gemini and the like will never be able to use it as a source.
In the third scenario, the file contains an error message or abnormal markup, for example:
Access denied
Blocked
Suspicious request detected
mod_security
Firewall: request rejected
This scenario is different from the blank file. Here the server’s response clearly indicates that something is actively blocking bots.
This means:
- a firewall rule identifies the request as suspicious;
- mod_security or BitNinja are filtering the user-agent;
- a Web Application Firewall is repelling the bot;
- an anti-DDoS system interprets GPTBot or Google-Extended as anomalous traffic.
In practice, the content is there but the server refuses to hand it over to the bots.
In summary, in scenario 1 the AIs can read you, in scenario 2 the AIs see nothing, and in scenario 3 the AIs are actively blocked.
Extra test: asking the AIs directly to read the page
This is a test that anyone can take in seconds, even without technical skills. It’s the most intuitive proof, and it’s often the one that triggers the light bulb: “ok, then the AI really can’t read my page”.
The test consists of asking ChatGPT or Google Gemini directly to analyze the indicated page. The prompt is this:
Analyze page X and give me 5 exact quotes that you can retrieve by scanning the page directly, without using external resources.
If the AI can read the page, it returns 4–5 sentences that match the text exactly. Maybe not 100% perfect, but you’ll recognize the content immediately.
If the AI can’t read the page, this can happen:
- the AI says that it cannot access the page;
- make up quotations that do not exist on the page;
- summarizes inaccurate or out-of-context content;
- claims that the page does not exist or is empty;
- produces generic phrases not present in HTML.
This is the clearest evidence that there is a blockage. If the AI doesn’t see the page, it can never index it or use it as a source.
ChatGPT and Gemini blocked? The most common causes
Here you have to be very honest. This problem does not affect a single provider, but can happen with any hosting, platform or control panel: cPanel, Plesk, dedicated servers, VPS, cloud, managed WordPress or custom solutions. Here are
Firewalls and WAFs blocking AI bot user-agents (GPTBot and Google’s AI bots)
Many hosting servicess use advanced protection systems (BitNinja, Imunify, ModSecurity, CloudLinux, Sucuri, Cloudflare WAF). These tools, which were created to block suspicious traffic, sometimes classify AI bots as “aggressive bots” or “anomalous scrapers.” Result. GPTBot and Google’s AI bots are blocked even if they are legitimate.
ASN or datacenter-based locks
Some firewalls don’t block the bot by name, but by where it came from. If the data center matches patterns that are deemed risky, the request is automatically discarded. Yes, you can have a perfect website and bots don’t read it anyway.
Rules mod_security too aggressive
ModSecurity is a very useful, but extremely sensitive firewall. Rare user agents, missing headers, or recognized patterns can result in a block.
Many cookie banners (e.g. Iubenda in autoblocking mode) prevent unauthorized scripts from loading. If the page depends on JS for content, the AI bot sees the void.
DNS not yet propagated, bad IPs, or TTL too high
If you’re new to migration, some resolvers may still point to the old server, or the SSL certificate may not propagate, or the AI bot may hit the wrong IP.
Invalid SSL certificate or incomplete chain
Some AI bots reject content from websites with invalid, expired certificates, or incomplete SSL chains.
Rendering based on JavaScript
AI bots don’t run JavaScript like a browser. If the page content is generated on the fly, the AI bot sees a blank page even if the browser shows it perfectly.
What to do to unlock ChatGPT and Gemini: the step-by-step procedure
Once you understand that AI bots can’t read your website, it’s time to take action. The good news is that the issues that prevent ChatGPT or Gemini from accessing your pages are almost always relatively easy to fix. All you need is a little method and some targeted control.
Below is the recommended procedure for unblocking ChatGPT and Gemini. It is the same one I followed to solve my case and that allowed me to understand where the block was hidden.
Check if the content is available in HTML and not just via JavaScript
The first step is to make sure that the content of the page is really present in the source code. Many modern themes and plugins only load parts of the content via JavaScript. This isn’t a problem for human users, but it’s a disaster for AI bots, which don’t run scripts and don’t load dynamic elements.
How to check:
- Open your page in browser
- right click, View source
- searches within the HTML for the real text of the page
If the content does not appear in the source, it means that it is generated on the client side. In this case, AI bots will see the page as blank. You need to consider using a theme or builder that produces more static HTML or enable server-side rendering features when available. I often use the Divi theme builder by Elegant Themes, which allows you to create websites that are perfectly readable by AI in a very intuitive way.
Check the firewall and server protections
Most AI bot blocks originate at the firewall or WAF level. These are protection systems that automatically filter out suspicious requests and often include rules against non-standard bots. GPTBot and Google-Extended, for many firewalls, fall exactly into this category.
Per capire se il firewall è la causa, segui questi passaggi.
- Log in to your hosting panel and open the section dedicated to security;
- look for things like firewalls, WAF, bot protection, BitNinja, mod_security.
Check blocked request logs
If you see user agents like GPTBot, Google-Extended, or empty requests that are classified as suspicious, you’re on the right track.
Temporarily disable the offending rule. You don’t need to turn everything off. Most panels allow you to turn off individual protections for a few minutes.
Disable bot protection, automatic traffic filters, or specific mod_security rules.
Rerun curl tests like GPTBot and Google-Extended. If you suddenly get a 200 OK and the HTML file is complete, the firewall was the cause.
The firewall can make the website perfect for human users but completely invisible to AIs. It’s an insidious problem because everything seems to work from the browser.
Getting help from hosting support is often the best choice, just like in my case with Serverplan. Their management was quick, precise and decisive.
Another very frequent cause is cookie banners, especially those with “autoblocking” mode. These tools can block scripts and content until the user provides consent. AI bots don’t give any consent, so they’re faced with a mutilated page.
What to check:
- whether the cookie banner loads essential scripts only after consent;
- if autoblocking mode is active;
- if some scripts are loaded only on the client side.
To figure out if the banner is the cause, turn it off for a few minutes and redo the GPTBot and Google-Extended tests. If the HTML content is complete again, you have found the problem.
Control plugins that generate dynamic content or that are only activated on the client side
Some plugins load significant portions of the page via JavaScript. This is very common in visual builders, animation plugins, customization systems, and social sharing scripts. If the content is not present in the HTML source, or if the file downloaded by the bot is empty, that plugin may be responsible.
Solution:
- temporarily disable plugins that manage dynamic content;
- repeat the test;
- Re-enable plugins one by one to pinpoint the culprit.
Contact your hosting support to unblock ChatGPT and Gemini
Some issues are not solved by the WordPress panel. Sometimes it’s related to advanced server rules, WAF configurations, or deep security systems like BitNinja.
If the tests continue to fail, the best option is to open a ticket asking:
- to check if GPTBot and Google-Extended are blocked;
- to add exceptions for these user agents;
- to check for any automated filters;
- to migrate the site to a server without certain overly strict protections (as happened to me).
An example of a ticket is as follows:
Good morning
requests with user-agent “GPTBot” to https://yourdomain.com/page/ return 403/timeout.
With the browser, however, the page replies 200 OK.Tests:
– curl -I https://yourdomain.com/page/ → 200 OK
– curl -I -A “GPTBot” https://yourdomain.com/page/ → 403
– content downloaded → blank page (attachment)I kindly ask you to check firewall/WAF, mod_security rules or ASN blocks,
and in case of blocking AI bots to proceed with whitelisting.Thank you!
In my case, Serverplan’s support handled everything flawlessly, identifying the BitNinja problem and migrating the site to a server without that filter. The result was immediate: the AI bots started reading the site correctly again without having to change anything in the code or plugins.
How to check that everything is fixed
After making the changes, it is not enough to open the website from the browser to understand if you have really managed to unblock ChatGPT and Gemini. Human users see the page under normal conditions, while AI bots do not. For this reason, you need to run tests that simulate exactly how ChatGPT, Gemini, and the other crawlers attempt to access content.
The goal of the next commands is very simple: to put yourself in the shoes of the AIs and see what they see. If your tests show that the page is accessible, it means that bots can read and use it. If the tests fail, there is still something blocking it.
Below you will find the basic controls.
Check server response like an AI bot
This test checks whether the server accepts the request. It’s the lightest version, but it’s also the one that tells you right away if the bot is rejected.
GPTBot (ChatGPT):
curl -I -A "GPTBot" https://tuodominio.it/pagina/
Google-Extended (Gemini):
curl -I -A "Google-Extended" https://tuodominio.it/pagina/
Verify that the bot really manages to download the content
This test is more in-depth. It doesn’t just check if the server responds, but if it actually delivers the HTML. In practice, it simulates the full reading of the page by AIs.
GPTBot (ChatGPT):
curl -L -A "GPTBot" https://tuodominio.it/pagina/ -o pagina_gpt.html
Google-Extended (Gemini):
curl -L -A "Google-Extended" https://tuodominio.it/pagina/ -o pagina_gemini.html
How I managed to unblock ChatGPT and Gemini with the support of AI and Serverplan
In the last few days, analyzing the traffic data on my website, I noticed a suspicious drop and I sensed that it was an indexing problem on AI. I asked ChatGPT and Gemini to scan some specific pages and report the information they contained, but they were unable to do so. I tried disabling some WordPress plugins and, with a shiver down my spine, setting rules on the .htaccess file (don’t do it without backups and if you don’t know exactly what you’re editing!), but I couldn’t solve the problem.
What to do? I tried to ask ChatGPT directly what was preventing it from reading the content on the page and the AI replied like this:
I checked the link you provided and encountered an internal error/timeout: the server returned “Internal Error” and I was unable to retrieve the content of the page.
I ran the verification tests described above:
curl -I → 200 OKcurl -I -A "GPTBot" → timeoutcurl -L -A "GPTBot" → file vuoto
The problem was obvious. The bot was blocked at the firewall level, but it was not clear where.
I opened the ticket. Serverplan’s support was fast, precise and transparent. They explained to me that the block came from the BitNinja proactive security system, which in some cases restricts access from specific IPs or ASN classes. This is a rare case and does not depend on the hosting itself, but on the security configuration of the individual server. They proposed a clean and definitive solution. Free migration of my hosting to a server where BitNinja was not active. Everything was planned, communicated and carried out within the agreed timeframe.
After migration:
- new IP updated correctly;
- curl -I -A “GPTBot” → 200 OK;
- the downloaded page contains all the content;
- ChatGPT and Gemini can finally read the website.
The problem was solved brilliantly and I can now work on indexing my website on ChatGPT and Gemini without any problems again.
Problem solving: the method I use to diagnose indexing anomalies
When a website suddenly stops being indexed or does not appear in AI systems such as ChatGPT and Gemini, it is not enough to guess the cause. A structured diagnosis process is needed. This is the approach I systematically use on my projects to quickly identify the source of the problem and verify that the solution is really effective. the same one that in this case helped me solve a big indexing problem.
Symptom detection
First thing: distinguish a normal drop from an abnormality. In this case, the simultaneous absence from ChatGPT and Gemini indicated a technical, not algorithmic, cause.
Formulation of hypotheses
I have listed the possible causes:
- robots.txt
- server restrictions
- misconfigured security rules
- anti-bot protections
- DNS problems
- redirects or caching eroors
Having a clear outline avoids following wrong leads.
Technical verification of exclusions
I checked robots.txt, headers, status code, accessibility from different user agents and the presence of anomalous configurations at the server level. This step eliminates 70% of the most common causes.
Log analysis and cross-confirmations
I checked if the AI bot user agents were really trying to log in. The total absence of attempts confirmed that it was not a “crawling error”, but a block.
Interaction with Technical Support
I opened a ticket with Serverplan, explaining the context and sharing the tests already carried out. Collaborating in a structured way with the assistance accelerates the diagnosis and allows the correct technical checks to be carried out immediately.
Solution validation
After the firewall rule was removed, I repeated all the tests, including logging in via specific user agents and external tools. Confirmation of the resolution came only when the AI bots resumed querying the site correctly.
Lesson Learned
- Anti-bot protections can block even the most advanced AI systems.
- A structured diagnosis method is needed to distinguish a technical error from an indexing problem.
- Collaborating with technical support in a documented manner dramatically reduces resolution time.
- In an ecosystem dominated by generative search, monitoring who accesses and who does not access the site is as crucial as monitoring traffic.
The new balance between SEO, AI and technical infrastructure
Being indexed by AI does not only mean publishing original, useful and authoritative content. It also means ensuring that that content is truly accessible to the models who need to read it. SEO remains fundamental. Structured data remains critical. But they are not enough on their own.
The technical side today weighs as much as the editorial quality. A firewall that is too rigid, an aggressive WAF, a cookie banner in autoblocking mode or a plugin that generates content only via JavaScript can make a perfect page invisible. For the user, everything works, while for ChatGPT or Google Gemini, that same page is empty.
For this reason, the new web requires a more comprehensive approach. You need to know how to read the signals, do targeted tests, interpret the data and also intervene on the infrastructure level when necessary. Sometimes with the help of hosting assistance, sometimes with the help of the AIs themselves.
We are entering a phase where SEO, content, and technology must work together. And those who know how to integrate these skills will have a real competitive advantage in the world of GEO (Generative Engine Optimization).



