Bot, Domains, SSO, Access, and Cleanup

05.08.26

I've been steadily chipping away at my TODO list. Although the code is currently in a rough state, it's been almost a couple months since I pushed an update, so I figured it's time to write a quick update blog post.

Most of the work has gone into the apps/demo code. I decided to avoid thinking about the proper place for code (e.g. does it belong in core) for now. Once the dust has settled, I hope it will be clearer what a proper core/library API will look like (or maybe it won't make sense at all!).

Access Control

lib.Guard provides an interface for checking capabilities for access control (e.g. can the caller write a Foo record in tenant X?). lib.Cap defines capabilities.

lib.Access contains the capabilities for a principal. The demo app loads capabilities early in the http request (in api.IamMiddleware), builds an Access object, and stores it in the Context using lib.PutAccess. lib.ContextAccessGuard implements the Guard interface by looking up Access in the Context.

Access checks are often done at the storage layer, which I hope leads to a deeper level of protection. In the future, I might also add access checks at the api layer, so that requests can easily be denied early on before getting down to the storage layer.

One piece that needs more thinking is that case where the system needs access that the end user doesn't have, and so capabilities are granted to a specific scope of code. I've been calling this "system access" in my head, and there are a few TODOs scattered through the code with thoughts about how to do this in a maintainable way. For example, I've noticed that, when there's third layer of helper code involved (i.e. api -> helper -> storage), it becomes tricky to figure out exactly which capabilities the code needs - I wonder if there's a better way to make this easy to search/process/audit/etc. Maybe a static analysis tool?

Bots

I started to invent a concept for non-human (aka programmatic, service) users, which I'm calling "bots". You'd use a bot for a CI job or maybe a CLI tool, for example. Bots have API keys.

I haven't implemented it yet, but I've decided to use HMAC-style request signing as the auth mechanism. I like the idea that the secret key isn't transmitted on every request.

Lots more work to do here.

SSO and Domains

I worked a lot on single sign-on (SSO). I started by setting up the idea of "domains" (as in DNS), which can be created in a tenant and have DNS TXT record based verification.

I worked on the abilitiy to store SSO config in the database, so that a user could set up their own SSO connection (i.e. self-service SSO), and then connect that to a verified domain. With all that, it should be possible for alex@atlas9.dev to automatically flow into SSO, for example.

Lots more work to do here, of course, but I'm able to connect my test Okta instance to the demo app. Pretty cool!

Cleanup

I did lots of other cleanup. I moved a lot of code around, mostly breaking things into api, lib, and store buckets. The also rewrote the frontend to be completely client side (based on mitril.js). The original dashboard and admin panel was split between JS and Go templates, and I think it's easier to reason about a server that only presents and API (so the API can be clearly inspected) and leave the presentation to the client. For that reason, I settled on a pattern of api/api_foo.go and api/impl_foo.go, where the api_foo.go file defines the API types and routes, and the impl_foo.go file has all the implementation code. I like this so far, I think it makes looking at the API contract easier.

I also ditched the outbox task system for now and created a simpler tasks system. The outbox system is probably overengineered, having this event log that was fanned out into tasks. When I went to actually use it, I found it confusing and too complex. I expect there are more refinements to make here.

There's a lot of other cleanup in progress or on the TODO list.

Check out the demo app code at https://atlas9.dev/src/apps/demo/