Mt. Kolmogorov: A Complex Climb

Mt. Kolmogorov: A Complex Climb

A minimalist approach to code-generation

In the second part of this series, we will take a quick quasi-philosophical detour and think a bit more formally about the core essence of code-generation and abstractions. Then we will quickly recap what we have built so far to compose a minimal description for an enterprise data CRUD+Search app.

1 Kolmogorov Complexity & MDL

Kolmogorov complexity measures the complexity of a sequence of symbols by determining the length of the shortest computer program that can generate that sequence. It can be understood, loosely, as a measure of how efficiently a sequence can be described. The lower the Kolmogorov complexity, the simpler the sequence is to describe. Now let's think of that sequence as being that of our finally generated program itself. My hypothesis is that most usual software, even with 100K+ LoC, have a more minimal description that can be used as a seed to generate the rest of the code. In few bespoke cases, this minimal description for the larger program is just the project specs and C4/UML diagrams. Often times, it something much simpler than that. Borrowing from NLP and computation linguistics, we will call these minimal descriptions and the length of such minimal descriptions as MDL.

2 (Meta) Kolmogorov Complexity of our Apps

Minimum description length (MDL), as a principle, states that the best model for a dataset is the one that minimizes the combined length of the model description and the data encoded using that model. This means that the MDL principle seeks to find the simplest model that can adequately explain the data.

N.B. "Data" is code in the context of code-generators.

MDL is thus essentially an application of the abstract concept of Kolmogorov Complexity, which on its own can only provide ordinal comparisons as opposed to more precise cardinal rankings from minimal description lengths.

In a data driven application, schema for data at rest and data over the wire dictate the kinds of services that can be possible. At the very least, the type signature, if not the behavior, of our data-facing service methods are bound to adhere to our schema's constraints and types. Given the MDL principle, and given that schema are generally ubiquitous, we can pivot to our schema to achieve the minimal length seed for our large programs. In cases where schema is implicit and needs to be inferred per collection, document, row or column, we would need the shape of some representative data records. This alone can make the seed description arbitrarily large but together with schema's constraints and guidance, the data representations can be a helpful addition to our MDL to infer data-shape-related interactions.

3 Schema and API

Say we have a database table or collection called user_profile and we use it to update user's preferences and profile information. How should the REST API providing interface look like? Now let's say that you have a table or collection about which you know nothing yet.

Can you still write the REST API for data create, update, read, search, evict or delete for that entity?

For a seasoned developer, across all major languages and frameworks, the answer is yes - in theory. Here's one example of what is possible with Prisma and its DMMF (from the last article) for a query controller:

export interface Query<NTT extends NTTKey> {
    findUnique: (
        ntt: NTT,
        args: FindUniqueArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['findUnique']>;

    findFirst: (
        ntt: NTT,
        args: FindFirstArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['findFirst']>;

    findMany: (
        ntt: NTT,
        args: FindManyArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['findMany']>;

    count: (
        ntt: NTT,
        args: CountArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['count']>;

    aggregate: (
        ntt: NTT,
        args: AggregateArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['aggregate']>;

    fields: (ntt: NTTKey) => FieldRefType<NTTKey>;

    groupBy: (
        ntt: NTT,
        args: GroupByArgType<NTT>,
    ) => ReturnType<DelegateType<NTT>['groupBy']>;
}

This interface is meant for a controller class exposing all of the ORM capabilities for querying the database. It is not tied to any particular entity, which I am calling NTT here. Now a frontend dev can directly use prisma to compose their own queries. Should a frontend dev be unaware of prisma? In that case, the backend dev can simply write a client sdk for the the frontend engineers. The main advantage is that no matter how many entities are there, there can be just one controller. Alternatively, there can be as many CRUD controllers as there are entities but we would need ts-morph for that. This will be the core theme for the next article in this series.

Can such a generalized query controller be implemented?

Yes. Fairly trivially it seems. Here's an example:

@Controller('query')
export class DBQueryController implements Query<NTTKey>
{

  @Put(':ntt/find/unique')
  findUnique<NTT extends NTTKey>(
    @Param('ntt') ntt: NTT,
    @Body() args: FindUniqArgType<NTT>
  ): ReturnType<DelegateType<NTT>['findUnique'] {
    return getDelegate(ntt).findUnique(args);
  }

  @Put(':ntt/find/first')
  findFirst<NTT extends NTTKey>(
    @Param('ntt') ntt: NTT,
    @Body() args: FindFirstArgType<NTT>
  ): ReturnType<DelegateType<NTT>['findFirst'] {
    return getDelegate(ntt).findFirst(args);
  }

  @Put(':ntt/find/many')
  findMany<NTT extends NTTKey>(
    @Param('ntt') ntt: NTT,
    @Body() args: FindManyArgType<NTT>
  ): ReturnType<DelegateType<NTT>['findMany'] {
    return getDelegate(ntt).findMany(args);
  }

  @Put(':ntt/count')
  count<NTT extends NTTKey>(
    @Param('ntt') ntt: NTT, 
    @Body() args: CountArgType<NTT>
  ): ReturnType<DelegateType<NTT>['count'] {
    return getDelegate(ntt).count(args);
  }

  @Put(':ntt/aggregate')
  aggregate<NTT extends NTTKey>(
    @Param('ntt') ntt: NTT,
    @Body() args: AggregateArgType<NTT>
  ): ReturnType<DelegateType<NTT>['aggregate'] {
    return getDelegate(ntt).aggregate(args);
  }

  @Put(':ntt/fields')
  fields<NTT extends NTTKey>(@Param('ntt') ntt: NTT): FieldRefType<NTT> {
    return getDelegate(ntt).fields;
  }

  @Put(':ntt/groupBy')
  groupBy<NTT extends NTTKey>(@Param('ntt') ntt: NTT, @Body() args: GroupByArgType<NTT>): ReturnType<DelegateType<NTT>['groupBy'] {
    return getDelegate(ntt).groupBy(args);
  }
}

NTTKey was introduced in the first part of the series and is simply a union of literal types representing all our table names. The client side SDK can essentially wrap over the Query interface shown above to make isomorphic type-safety like tRPC possible. For aiding with generics and type-safety, getDelegate finds us the right Prisma entity delegate needed to dispatch the JSON query to the DB query engine. Here's an example implementation of getDelegate:

import type { NTTKey } from './zen/entities-type';

export const prisma = new PrismaClient({
    errorFormat: 'pretty'
});

export const mapper = {
    Principal: prisma.principal,
    Account: prisma.account,
    Session: prisma.session,
    VerificationToken: prisma.verificationToken,
    Authenticator: prisma.authenticator,
    Payment: prisma.payment,
    Appointment: prisma.appointment,
    AppointmentType: prisma.appointmentType,
    Location: prisma.location,
    Patient: prisma.patient,
    Equipment: prisma.equipment,
    Service: prisma.service,
    Provider: prisma.provider,
    Form: prisma.form
} as const;

export type Delegate<T extends NTTKey> = typeof mapper[T];

export function getDelegate<T extends NTTKey>(key: T): Delegate<T> {
    const delegate = mapper[key];
    return delegate;
}

Notice how the Delegate type is constructed using our mapping object as a compile time constant. This is a fairly advanced usage of Typescript and the explanations for how I independently discovered this pattern would be a subject for an upcoming article in this same series. Meanwhile, I would like to show how something like FindUniqArgType from the interface above is built using the same type-mapping technique. In this case we are mapping over an interface instead of a compile time constant:

import type { Prisma } from "@prisma/client";
import type { NTTKey } from "./entities-type";

export interface FindUniqueArgs {
    Principal: Prisma.PrincipalFindUniqueArgs;
    Account: Prisma.AccountFindUniqueArgs;
    Session: Prisma.SessionFindUniqueArgs;
    VerificationToken: Prisma.VerificationTokenFindUniqueArgs;
    Authenticator: Prisma.AuthenticatorFindUniqueArgs;
    Payment: Prisma.PaymentFindUniqueArgs;
    Appointment: Prisma.AppointmentFindUniqueArgs;
    AppointmentType: Prisma.AppointmentTypeFindUniqueArgs;
    Location: Prisma.LocationFindUniqueArgs;
    Patient: Prisma.PatientFindUniqueArgs;
    Equipment: Prisma.EquipmentFindUniqueArgs;
    Service: Prisma.ServiceFindUniqueArgs;
    Provider: Prisma.ProviderFindUniqueArgs;
    Form: Prisma.FormFindUniqueArgs;
}

export type FindUniqueArgType<T extends NTTKey> = FindUniqueArgs[T];

This allows us to have a single generic type (FindUniqueArgType) in our interface for both the controller and the client-side SDK.

How is this code generated and kept up to date with migrations to the database? Here's a bit of ts-morph before the next article full of ts-morph:

 export function createTypeFile(
    project: Project,
    fileName: string,
    interfaceName: string,
    typeName: string,
    properties: { name: string; type: string }[],
) {
    const typeFile = project.createSourceFile(`${zenPath}/${fileName}.ts`, '', {
        overwrite: true,
    });

    /**
    * Adds:
    * import type { Prisma } from "@prisma/client";
    * import type { NTTKey } from "./entities-type";
    */
    typeFile.addImportDeclarations(essentialImports);

    /**
    * Adds 
    * export interface FindUniqueArgs {...}
    */
    typeFile.addInterface({
        name: interfaceName,
        isExported: true,
        properties,
    });

    /**
     * Adds
     * export type FindUniqueArgType<T extends NTTKey> = FindUniqueArgs[T];      
     */
    typeFile.addTypeAlias({
        name: typeName,
        typeParameters: [`T extends NTTKey`],
        type: `${interfaceName}[T]`,
        isExported: true,
    });

    typeFile.saveSync();
}

export const ModelNames: string[] = Prisma.dmmf.datamodel.models.map(
    (m: DMMF.Model) => m.name,
);

export function createProperties(suffix: string) {
    // Adds interface properties to the interface e.g. 
    // Patient: Prisma.PatientFindUniqueArgs;
    return ModelNames.map((ModelName) => ({
        name: ModelName,
        type: `Prisma.${ModelName}${suffix}`,
    }));
}

function createArgs(project: Project) {
    createTypeFile(
        project,
        'findUniqueArgs',
        'FindUniqueArgs',
        'FindUniqueArgType',
        createProperties('FindUniqueArgs'),
    );
    // Other interfaces go here in a similar way
}

export function initiateQb(): void {
    const project = new Project({
        tsConfigFilePath: './tsconfig.json',
        skipAddingFilesFromTsConfig: true,
    });

    createEntities(project);
    createEntitiesType(project);
    createArgs(project);
    // other generators go here

    project.saveSync();
}
initiateQb();

In all this, we have used information from the schema's AST only. Kolmogorov complexity is an idea "so abstract that it could just as well have been a prayer" - borrowing from the Foundation series.

So far we have generated a generic controller and seen a glimpse of how code can generate more code. In the next installment in the series, we will generate a separate query controller and a mutation controller for each entity. This will help us later create a CQRS system. Having a separate controller per entity will also allow us to apply various auth(n/z) guards and administrative policies for each API endpoint. Such guards and policy-enforcers can also be created by the code-generator itself as we will soon see in future installments in this same series.