Everything is a table
September 29th, 2020
In its usage, SQL shares more in common with bash scripts than most other languages. It’s more mortar than brick, used to smooth and adhere chaotic data for human or machine consumption. I suspect this is why analytics SQL feels so odd inside IDEs and git flows. It’d be unthinkable to build on top of an unpredictability shifting API. But that’s what most analysts…and sysadmins…do.
Speaking with career analysts, they tend to care more about schemas than the SQL. Schema objects, and in particular table-like objects, are shareable and durable and therefore serve a much better “building block” than SQL code. More important than the SQL editor is improving the schema explorer and tightly integrating it with the editor (ability to “Goto” tables from the code editor).
Schema table explorer
- Comments. Viewable for the table and columns, updateable from the UI
- Graph of usage over time (highlightable to view queries in that period)
- Treemap of “usage adjacent” tables
- Schema objects that reference the table (views, procedures)
- Relevant RBAC privileges
You can do this with a parse library and the data catalog. You definitely want to use the parse library for run queries (because we know them to be valid and the parse library to be up-to-date), whereas you still want a separate system for autocomplete and token highlighting.
The primary innovation around “bash scripts” are Dockerfile. The starting point of Docker images lets you reason about the OS and therefore make reuseable scripts. For the same reason, I had the hypothesis you could build a “Docker of data” by standardizing SQL schemas and thereby enabling an ecosystem of analysis and data tools. Docker is valuable because every company has at least two environments: development and production. Standardization is useful. However, most companies only have a single analytics database at any given moment. It’s kind of like Dropbox, pre-iPhone. Nice-to have, not must have. Switching costs matter a lot with nice-to-haves. There still might be something here, but I suspect it must start with collaboration and not standarization. Snowflake’s “sharehouse” or data marketplace might be the actual correct entrypoint.