Model Standards
Last updated
Last updated
Our general model and style standards are derived from the best practices guide put forth by dbt Labs:
All code that is submitted for a PR should be formatted according to this dbt autoformatter, available as an extension on the VS Code extension marketplace.
The primary needs with a Flipside dbt model (will be explained in greater detail below):
Blockchain data that has been indexed and piped into Snowflake under chainwalkers
schema. This includes all core data characterizing what occurs on a chain (e.g. blocks, transactions, function calls, state changes, etc). Chainwalkers 2.0 decodes log data, but does no other transformation.
The only data that has been added to this is _inserted_timestamp
. This has been added a requirement for all blockchains for efficiency and data integrity reasons.
Any ingested data should be in the bronze layer.
This layer is a view on the source, as such it is the only layer where {{ source() }}
should be called and no transformations shall happen in bronze.
Filtered, cleaned and transformed data.
Silver models should de-duplicate bronze, as chainwalkers may re-ingest a block at a later date.
The models in this layer are incremental as to reduce compute time and cost.
Fact and dimension tables which are completely decoded (where we can), are additive (no replacing data), deduped, have not been joined to other tables. We can expose silver tables in the public facing app as a gold view.
Can be a combination of data sources.
Curated models exposed to the public front-end.
Fact and dimension tables should never be combined in tables ie: no joining transactions and labels. They can be combined in views, where the logic is done in the underlying queries.
Views that can ease the burden for analysts. The wide array of data-users include some new to SQL, new to analytics, and/or new to blockchain technology. Incorporating off-chain metadata, and even aggregations (e.g. daily summaries of key transactions) can help new users understand the ecosystem as a whole.
This also eases the burden on the underlying systems, and facilitates “Easy” bounty programs.
Facilitates learning for new users.
Examples include: labels, transfers, event logs, swaps, liquidity provision/removal, daily balances of holders, prices and decimals, daily transfer summary.
Models that accelerate analytics for a particular protocol (e.g. compound, uniswapv3, anchor, mirror schemas).
All views should have a prefix of what type of data is within (based on star schema).
All names should_be_snake_case. Abbreviations should be avoided when possible. Abbreviations should match an iso standard ie: currency or country names, etc.
Facts (fact_
)
Fact table contains measurements, metrics, and facts. Anything that is measurable, summing, averaging, anything over time.
Dimensions (dim_
)
Dimension table is a companion to the fact table which contains descriptive attributes to be used as query constraining.
Convenience Views (ez_
)
Convenience Views can combine facts and dimensions for reasons above. This should only be views in the gold layer.
Where curation is doing more of the “lift” for the view.
SQL models do not dance alone! Each sql
file should have an accompanying yml
of the same name. This is the model properties file and we use it, primarily, to document the model and columns and to test data in the model, at the column level.
The model properties allow us to test the model output to ensure the data flowing through meets expectations. Some common generic tests we use are:
unique
not null
accepted values
There are also packages with utility functions that enhance the available tests to plug into a model, or you can write custom tests that apply to the model.
The yml
file is also where tables and columns are documented. These should be clear and concise for users to understand what data the model contains. As several columns might be used across models, we utilize the doc block to define the column in a markdown file, rather than in each individual model property file.
dbt Models can be configured to run using one of a number of strategies. The two most common materializations in our models are incremental and view.
True to the name, incremental models load data based on some incremental filter. In silver, just about every model will be incrementally built, based on _inserted_timestamp
.
Views persist as SQL transformations without actually storing data, like a table. Every core
(gold) model is a view on a silver incremental model. This is done to:
drop any internal columns that need not be exposed to end users (like _inserted_timestamp
)
rename the model to follow star schema
When building a model, be sure to check how columns are already named in the blockchain's data tables. Within a database, one model should not refer to tx_id
while another tx_hash
. A more comprehensive naming standard will be published soon™️.