sqlgg: SQL Guided (code) Generator

ygrek.org / p / sqlgg

sqlgg: SQL Guided (code) Generator

SQL query parser and binding code generator for C#, C++, Java, OCaml.

Problem

Writing database layer code is usually tedious and error-prone, due to the mix of different languages. SQL queries constructed dynamically need to bind external data (from application), and the resulting rowset must be decomposed into application native data. The impedance mismatch between database layer and application code, as well as the absence of static (compile-time) checking, frequently leads to runtime failures (and sometimes even security holes). One can factor out all common database communication code, hide the database under some application-specific abstraction, but one always needs to manually specify correspondence between SQL query binding slots (or resulting rowset columns) and code variables. This mapping should be updated manually every time SQL query is modified.

Solution

sqlgg is an SQL query parser and code generator which ensures that application code and database queries are in sync. It analyzes SQL query and determines the set of input parameters (values for INSERT, run-time substitution parameters) and the set of resulting columns (for SELECT). Then it generates the code in host language, mapping query parameters on function arguments with corresponding native data types. So basically you provide an SQL query and generator creates a function which takes the set of typed parameters as required to fill slots in a query. Generated code binds provided parameters into query and executes it. The code for SELECT statements additionally invokes a strongly-typed callback function for every row of the resulting rowset. The most fruitful consequence of such approach is that the host language compiler will itself check that functions generated from SQL queries will be called correctly (i.e. all parameters bound with correct types). So if you modify the query and forget to update the code – the compiler will point on erroneous parts.

Complete example

Suppose we have the database of some money relationships for the group of people. We use the following SQL tables and queries:

-- @create_person
CREATE TABLE person (id INTEGER PRIMARY KEY AUTOINCREMENT,name TEXT,surname TEXT);
-- @add_person
INSERT INTO person (name,surname) VALUES;

-- @create_money
CREATE TABLE money (src INTEGER, dst INTEGER, amount INTEGER);
-- @add_money
INSERT INTO money VALUES;

-- @calc_total
SELECT name || ' ' || surname AS fullname, SUM(amount) as total FROM person JOIN money ON src = id GROUP BY id;
-- @list_donors
SELECT DISTINCT surname FROM person JOIN money ON src = id AND dst = (SELECT id FROM person WHERE surname LIKE ?) LIMIT ?;

DROP TABLE IF EXISTS person;
DROP TABLE IF EXISTS money;
Generate the binding C++ code (boilerplate omitted, only function prototypes shown):

(corresponding C# C++ Java OCaml XML)

// DO NOT EDIT MANUALLY
//
// generated by sqlgg 0.2.2+ (f5b76ac7) on 2009-05-30T19:22Z
// visit http://ygrek.org/p/sqlgg/

#pragma once

template <class Traits>
struct sqlgg
{
  struct create_person
  {
    create_person(typename Traits::connection db);
    bool operator()();
  }; // struct create_person

  struct add_person
  {
    add_person(typename Traits::connection db);
    bool operator()(typename Traits::Text const& name, typename Traits::Text const& surname);
  }; // struct add_person

  struct create_money
  {
    create_money(typename Traits::connection db);
    bool operator()();
  }; // struct create_money

  struct add_money
  {
    add_money(typename Traits::connection db);
    bool operator()(typename Traits::Int const& src, typename Traits::Int const& dst, typename Traits::Int const& amount);
  }; // struct add_money

  struct calc_total
  {
    calc_total(typename Traits::connection db);
    template<class T>
    bool operator()(T result);
  }; // struct calc_total

  struct list_donors
  {
    list_donors(typename Traits::connection db);
    template<class T>
    bool operator()(typename Traits::Text const& _0, typename Traits::Int const& _1, T result);
  }; // struct list_donors

  struct drop_person
  {
    drop_person(typename Traits::connection db);
    bool operator()();
  }; // struct drop_person

  struct drop_money
  {
    drop_money(typename Traits::connection db);
    bool operator()();
  }; // struct drop_money

  create_person create_person;
  add_person add_person;
  create_money create_money;
  add_money add_money;
  calc_total calc_total;
  list_donors list_donors;
  drop_person drop_person;
  drop_money drop_money;

  sqlgg(typename Traits::connection db) : create_person(db), add_person(db), create_money(db), add_money(db), calc_total(db), list_donors(db), drop_person(db), drop_money(db)
  {
  }
}; // struct sqlgg

Things to note above:

  1. Syntax. All queries are written in plain SQL. Every statement should be terminated with semicolon. There is no need to write (?,?) after VALUES in INSERT statement. The annotation [sqlgg] name=function_name (or simply @function_name) before the query specifies the name of the generated function (this annotation is optional but very convenient).
  2. The generated code is parametrized by database-specific class Traits. It specifies the correspondence between SQL and native types, provides types for database connection and implements actual code to execute statements. Traits should be implemented once for every specific database API.
  3. Each statement has a corresponding struct which takes connection object as a constructor argument. This allows to reuse prepared statements several times (if database supports this feature).
  4. add_money() function call operator takes three data parameters – the values to INSERT into table (note the names and types).
  5. calc_total() returns data via result parameter, which is a callback that gets executed for each row of the resulting rowset, with statically typed parameters binding each column of a row. In this case the parameters are fullname of type Traits::Text and total of type Traits::Int.
  6. The types of parameters for list_donors were inferred correctly (limit is Int and surname is Text.
  7. Statements of arbitrary depth across many tables are supported.
  8. Statements are checked for correctness as far as generator is concerned, so it will detect syntax errors, non-existent columns in expressions, mismatched columns in compound statements, ambiguous column names etc.
Then manually-written C++ code boils down to (without error-checking):

(corresponding C# C++ Java OCaml)

#include "../impl/sqlite3_traits.hpp" // sqlite3 traits
#include "demo_cxx_gen.hpp" // generated
#include <iostream>
#include <vector>

using namespace std;

typedef sqlgg<sqlite3_traits> gen_t;
typedef long long int64;

struct output_transfers
{
  void operator()(std::string const& fullname, int total)
  {
     cout << fullname << " = " << total << endl;
  }
};

struct output_donors
{
  void operator()(std::string const& surname)
  {
     cout << surname << endl;
  }
};

int main()
{
  sqlite3* db = NULL;
  sqlite3_open(":memory:", &db);

  gen_t gen(db);

  // create tables
  gen.create_person();
  gen.create_money();

  // add all person records
  gen.add_person("John","Black");
  int64 john = sqlite3_last_insert_rowid(db);
  gen.add_person("Ivan","Petrov");
  int64 ivan = sqlite3_last_insert_rowid(db);
  gen.add_person("Sancho","Alvares");
  int64 sancho = sqlite3_last_insert_rowid(db);

  // add money relations
  gen.add_money(john,ivan,200);
  gen.add_money(john,sancho,100);
  gen.add_money(john,sancho,250);
  gen.add_money(sancho,ivan,300);

  // summarize by person
  cout << "Total transfers:" << endl;
  gen.calc_total(output_transfers());

  // list donors
  cout << "Donors:" << endl;
  gen.list_donors("petrov",100,output_donors());

  // properly close database
  sqlite3_close(db);

  return 0;
}

The code is straightforward and free of most database-specific details. More importantly it is statically checked by the compiler – suppose we change the database schema and add one more field to money table (e.g. timestamp of transfer). Compilation rightfully fails:

demo_cxx.cpp: In function `int main()':
demo_cxx.cpp:29: error: no matching function for call to
`sqlgg<sqlite3_traits>::add_money(sqlite3*&, int64&, int64&, int)'
demo_cxx_gen.hpp:75: note: candidates are: static bool sqlgg<Traits>::add_money(typename
Traits::connection, const typename Traits::Int&, const typename Traits::Int&, const typename
Traits::Int&, const typename Traits::Int&) [with Traits = sqlite3_traits]

Details

Distinguishing feature of sqlgg is that it starts off with actual SQL queries, not object models or SQL table descriptions. Ideally, it generates {meta,template,abstract,generic} code which is later specialized for the actual database/environment by the means of the target language. The generated code doesn’t require any special runtime support and doesn’t impose any restrictions on the application.

The main idea is that the generator should take care only of semantic binding between SQL and code sides, being as unobtrusive as possible. So the choice of the specific database and API is a programmer’s choice. Similarly, queries to the database are expressed in plain SQL, so that the generator can be easily plugged in any existing project – just move all SQL statements used in the code to a separate file and feed it to generator. The generated code can be easily inspected just as any other code in the project, and you can even edit it manually if such need arises (though it is highly not recommended). In order to keep code and database layer in sync the generator must be invoked every time SQL queries are modified.

For now the status of this project is works for me . It is evolving and there is plenty of room for improvement. The generated code may occasionally change in incompatible way so it is recommended to keep generated code in the project tree, to prevent unexpected breakage. NB ocaml sqlite3 and mysql db traits are available as sqlgg.sqlite3 and sqlgg.mysql packages respectively. For other languages will need to keep db traits in the project. Feedback/suggestions on output code structure/naming best fitting the target language/platform are very welcome.

This project was started when I found myself editing existing code with tons of C++ wrappers for SQL queries, each binding several parameters and decomposing results. I used it successfully for C++ desktop applications with sqlite3 engine and C#/Mono server-side component with mysql database and OCaml projects with sqlite3 and mysql too.

Sqlgg understands the subset of the standard SQL language with arbitrary database-specific extensions. The goal is to cover as much SQL as possible in order to accept most of the real-world SQL statements unmodified. It will not (currently) warn on incompatible syntax (e.g. from two different sql dialects) used in the same project (but it may do so in the future).

Available output languages:

Download

Install from OPAM with:

opam install sqlgg

Or download latest release and build manually from ygrek.org or github.

See changes history.

Source repository at github or repo.or.cz.


2020-05-22

ygrek at autistici dot org