Why I replaced Knot with PowerDNS for BYOCDN

Background

I am building a BYOCDN (Bring Your Own CDN) management control plane service that allows users to bring up their own compute nodes and have it act as a CDN for your own resources.

One of the key ingredients is the ability to route traffic to the nearest node to the requester.

The current iteration of BYOCDN requires users to use my authoritative DNS service.

What was I trying to do?

I had a problem. My Authoritative DNS service was built with a hidden primary and public secondary DNS architecture in mind. I have been using KnotDNS (both primary and secondary) to tie everything up.

My frontend service translates the DNS records into BIND zonefiles which syncs to my hidden primary DNS server and then gets transferred to the secondary DNS servers via AXFR.

This works well for traditional DNS queries but it wouldn't be able to naturally handle georouting queries without using a plugin. These plugins require that the georouting logic layer have to use a separate mechanism and I will need to build another syncing tool just to manage this.

I tried looking at integrating georouting using special private DNS types so that I can still using my existing zonefiles as a single source of truth, but Knot doesn't appear to support that.

Why the obvious one didn't work out?

I started to look for a replacement and found that PowerDNS's remote backend might allow me to support my intended architecture without significant changes or too much moving parts.

I explored using PowerDNS's own GeoIP backend but they also had the same issue as Knot's geoip plugins. I will need to write a specialised yaml file for each domain and that would get unwieldy.

What I chose instead?

After much deliberation, using PowerDNS' remote backend appear to fit the bill.

I built a remote backend service that receives information of all the domain names that needs to be georouted and all its available ips, and when queried would return the ip that has the closest distance to the requester via the Haversine Formula.

I then replaced all my secondary DNS servers with PowerDNS and configured each of them to launch with `launch=remote,lmdb` across my Anycast network.

These servers are then configured to hit the remote backend that is running locally and if they do not get any answer, fallback to the original zone records that my primary DNS server distributed.

One thing that was harder than expected

I did not expect to run into some nuances when working with the remote backend. While the documentation is relatively clear, it was not that obvious whether PowerDNS would fall to the next backend if the remote backend did not return a result.

Only after trial and errors, did I learnt about how PowerDNS handles fallthrough.

If your backend returns any response to a query instead of responding a `false`, PowerDNS will accept the answer from the backend in its totality and will not query the backend next in line.

This caused me some trouble initially as PowerDNS when querying the remote backend would only make an `ANY` DNS query even if the query was just an `A` record.

This means I will need to return all domain records belonging to the query name. As my remote backend only knows the records of those that need to be georouted, I had to pull the rest of the records from my static BIND zonefiles and return those together with the georouted results.
If I did not return the other records, MX records, CNAMEs and other types will not be returned and it will cause a bunch of cascading issues.

Thank you for reading all the way to the end. I've been trying to work on myself to try and write more and it took a lot for me to start writing again. If you liked what I wrote, please drop me a DM on mastodon to encourage me. It will mean a lot to me.