Bugs can be tricky

Post

Posted
Rating:
Item has a rating of 5 (Liked by Adam)
#5023 (In Topic #1124)
Avatar
Site director
Chris Graham is in the usergroup ‘Administrators’
This is going to be a very technical post. I'll keep it as simple as reasonable possible though. I thought it would be interesting to share, and also writing this down helps crystallise my thinking.

Today I had an issue with Google Search Console showing server errors from a client of ours, yet manual tests show Google can access the site fine, and nothing was logged as an error on the server. The search ranking of the client is not likely going to be affected like this, but just the possibility of that happening is something that would set of alarm bells to most people.

Here's what was the problem…

Months ago, there was a bug with our HTML minification on the client site (code to reduce the size of the HTML code automatically, coming in v11), which led to some invalid URLs. Even though this is long fixed, a very minor bug, and only there for a short time, Google remembered some of these invalid URLs.

Certain invalid URLs in Composr CMS can a "hack-attack alert", because they look like a hacker is trying to compromise the system.
Enough of these alerts, and the connecting IP is banned. It's an important security technique we have which has blocked many malicious bots over the years from wasting system resources.

But a banned IP address should not result in a server error (a '500' error technically), it should result in some kind of access denied error code. Well, the server in question is behind a firewall, so IP addresses seen by the web server are that of the gateway router, not of the end user. At some point the web server remaps them to the real IP address, but only after server-level bans were checked. This meant that bans were running in Composr instead of server-level, which is a secondary level of ban security that we have but usually isn't needed. These bans were throwing out Composr "Critical errors", which are implemented with the 500 error code. That's an oversight which I've now fixed.

But, if Google rendered the page fine, how could it be banned? Well, only 1 single IP address from Google was banned, and hence it was only banned some very small percentage of the time. From the point of view of looking at Google Search Console errors though, that's not something you'd pick up on (Google Search Console provides very limited information).

But, we actually prevent Google from being banned automatically. We're not stupid enough to just ban any IP address that goes to malicious URLs, as that itself would be a vulnerability to allow a site to get kicked off Google. We check for if an IP is Google by doing a reverse DNS check, looking to see if the DNS address is that of an important crawler (we can't trust the user-agent). On the particular client server involved though, reverse DNS checks have a trailing dot, while on previous tested machines they did not. This threw off the check. (We can't just do a substring check as that would be vulnerable, but our check has been enhanced)

Further complicating things, the logged user agent banned was not Googlebot, it was the Google Adsense crawler. Why would Google Search Console be showing errors relating to Adsense? Well, Google task their machines for multiple things. So while we banned a machine acting as an Adsense crawler, that same machine also acted as a regular Google crawler.

I have now added automated tests for checking our bot detection via IP whitelisting and DNS whitelisting. And, I've made the DNS whitelisting configurable via an overridable text file for v11.

To aid future debugging, I have added a Health Check, for checking (with the latest whitelists and code) whether any crawler IPs have been banned. This is now one of about 200 checks we run to make sure something isn't badly screwed up on a Composr website. That's way more checks than any human can reasonably stay on top on or even really know about, which is why I love this Health Check system (coming in v11, although a version of it is available as a v10 addon).

Here's a summary of the curve balls which made this so difficult to debug:
  1. Googlebot going to URLs that weren't even linked from anywhere
  2. 'Server errors' for what was actually a ban situation
  3. Google 'banned' but still able to access the site fine when tested from Google Search Console
  4. Google banned even though Google could not be banned
  5. Google search crawler banned even though it was Google Adsense crawler banned

And this, ladies and gentlemen, is why sometimes I really struggle to get my billable hours in! About 5 hours of my day for something that I only found about this morning (due to a Google email alert). I don't charge clients for Composr bugs, however esoteric they may be.


Become a fan of Composr on Facebook or add me as a friend. Add me on on Twitter. Follow me on Minds (where I am most active). Support me on Patreon

Was I helpful?
  • If not, please let us know how we can do better (please try and propose any bigger ideas in such a way that they are fundable and scalable).
  • If so, please let others know about Composr whenever you see the opportunity or support me on Patreon.
  • If my reply is too Vulcan or expressed too much in business-strategy terms, and not particularly personal, I apologise. As a company & project maintainer, time is very limited to me, so usually when I write a reply I try and make it generic advice to all readers. I'm also naturally a joined-up thinker, so I always express my thoughts in combined business and technical terms. I recognise not everyone likes that, don't let my Vulcan-thinking stop you enjoying Composr on fun personal projects.
  • If my response can inspire a community tutorial, that's a great way of giving back to the project as a user.
Online now: No Back to the top

Post

Posted
Rating:
#5025
Avatar
Standard member
ironfeather is in the usergroup ‘Well-settled’
Wow! intense, good work getting that solved.

———–
Publisher of IronFeather Journal since 1987.  Host of KGNU Colorado Radio for 20 years. 
Currently in Japan & decided to focus on Composr as my number one CMS.
Composr site for community of Hokkaido:  Nandalow
Composr site for my freelance work: "Partners in Progress" - Future Code Japan
My Compsr edits : 
http://ironfeather.com/bbs/viewtopic.php?f=12&t=2862

 
Online now: No Back to the top
1 guest and 0 members have just viewed this.

Statistics

Users online:

Paul D, ManojSree, Philip, Vaiva, John Connor, mytracker, deepu_ms, gabriel58

Forum statistics:
  • 1,062 topics, 5,122 posts, 5,962 members
  • Our newest member is esparkbiz
Birthdays:
Back to Top