Flipsideβs blockchain data is optimized for performance through strategic clustering and indexing.
Understanding how tables are organized will help you write faster, more cost-effective queries.
How Flipside clusters data
All Flipside tables are clustered on block_timestamp (or equivalent time field).
What this means:
Queries filtering by time are extremely fast
Data is physically organized by timestamp on disk
Snowflake can skip irrelevant micro-partitions
Always filter by time for optimal performance.
Understanding timestamp fields
Flipside tables include multiple timestamp fields:
block_timestamp
The timestamp when the block was produced on the blockchain.
Use for:
Time-series analysis
Historical queries
Filtering by when events occurred on-chain
Clustered: β
Yes (primary clustering key)
-- β
GOOD: Clustered query
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= '2024-01-01'
AND block_timestamp < '2024-02-01' ;
_inserted_timestamp
The timestamp when the row was first inserted into Flipsideβs database.
Use for:
Identifying newly added data
Incremental data loads
Data pipeline orchestration
Clustered: β No
-- Use for incremental loads
SELECT *
FROM ethereum . core .fact_transactions
WHERE _inserted_timestamp >= DATEADD ( hour , - 1 , CURRENT_TIMESTAMP ());
_modified_timestamp
The timestamp when the row was last modified (for updates or corrections).
Use for:
Detecting data corrections
Change data capture (CDC)
Audit trails
Clustered: β No
Performance tip: Always include a block_timestamp filter, even when filtering on
_inserted_timestamp or _modified_timestamp. This leverages the clustering.
Query optimization best practices
1. Always filter by time
Good - Uses clustering
Bad - No time filter
SELECT
block_number,
tx_hash,
from_address,
to_address
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 7
AND to_address = '0x...'
LIMIT 1000 ;
Why it matters:
Without time filter: Scans entire table (billions of rows)
With time filter: Scans only relevant partitions (millions of rows)
Result: 10-100x faster queries, lower costs
2. Use specific time ranges
Narrow time ranges = faster queries.
-- β
Best: Specific range
WHERE block_timestamp >= '2024-01-01'
AND block_timestamp < '2024-01-08'
-- β οΈ OK: Last N days
WHERE block_timestamp >= CURRENT_DATE - 30
-- β Avoid: Open-ended queries
WHERE block_timestamp >= '2020-01-01'
3. Limit result sets
Use LIMIT to constrain output:
-- For exploration
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 1
LIMIT 1000 ;
-- For production queries
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 30
AND success = TRUE
ORDER BY block_timestamp DESC
LIMIT 10000 ;
4. Filter early, aggregate late
Push filters down before joins or aggregations:
Good - Filter first
Bad - Filter after aggregation
WITH recent_txs AS (
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 7
AND success = TRUE
)
SELECT
DATE_TRUNC( 'day' , block_timestamp) AS date ,
COUNT ( * ) AS tx_count
FROM recent_txs
GROUP BY 1 ;
5. Use appropriate join strategies
When joining multiple tables, ensure both have time filters:
SELECT
t . tx_hash ,
t . from_address ,
s . amount_usd
FROM ethereum . core .fact_transactions t
JOIN ethereum . defi .ez_dex_swaps s
ON t . tx_hash = s . tx_hash
WHERE t . block_timestamp >= CURRENT_DATE - 7 -- Filter on both tables
AND s . block_timestamp >= CURRENT_DATE - 7
LIMIT 10000 ;
6. Leverage materialized CTEs for complex queries
For queries with multiple CTEs, consider using MATERIALIZED:
WITH base_data AS MATERIALIZED (
SELECT
block_timestamp,
from_address,
to_address,
eth_value
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 30
AND eth_value > 0
)
SELECT
from_address,
COUNT ( * ) AS tx_count,
SUM (eth_value) AS total_eth
FROM base_data
GROUP BY 1
HAVING total_eth > 100
ORDER BY total_eth DESC ;
Common query patterns
Time-series aggregations
-- Daily transaction counts
SELECT
DATE_TRUNC( 'day' , block_timestamp) AS date ,
COUNT ( * ) AS tx_count,
COUNT ( DISTINCT from_address) AS unique_senders
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 30
GROUP BY 1
ORDER BY 1 ;
Address-specific queries
-- All activity for an address
SELECT
block_timestamp,
tx_hash,
from_address,
to_address,
eth_value
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 90
AND (from_address = LOWER ( '0x...' )
OR to_address = LOWER ( '0x...' ))
ORDER BY block_timestamp DESC ;
Case sensitivity: Ethereum addresses are case-insensitive. Use LOWER() for comparisons to
ensure matches.
Token transfer analysis
-- USDC transfers over \$10K
SELECT
block_timestamp,
from_address,
to_address,
amount,
amount_usd
FROM ethereum . core .ez_token_transfers
WHERE block_timestamp >= CURRENT_DATE - 7
AND contract_address = LOWER ( '0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48' ) -- USDC
AND amount_usd > 10000
ORDER BY amount_usd DESC
LIMIT 1000 ;
DEX swap analysis (Premium)
-- Top Uniswap swaps by USD value
SELECT
block_timestamp,
tx_hash,
trader,
token_in,
token_out,
amount_in_usd,
amount_out_usd
FROM ethereum . defi .ez_dex_swaps
WHERE block_timestamp >= CURRENT_DATE - 1
AND platform = 'uniswap-v3'
AND amount_in_usd > 50000
ORDER BY amount_in_usd DESC
LIMIT 100 ;
Use Snowflakeβs query profile to identify bottlenecks:
-- Check your query history
SELECT
query_id,
query_text,
execution_time,
bytes_scanned,
rows_produced
FROM TABLE ( INFORMATION_SCHEMA . QUERY_HISTORY ())
WHERE query_text ILIKE '%ethereum%'
ORDER BY start_time DESC
LIMIT 10 ;
Estimate query cost
-- See bytes scanned (proxy for cost)
SELECT
query_id,
query_text,
bytes_scanned / POWER ( 1024 , 3 ) AS gb_scanned,
execution_time / 1000 AS seconds
FROM TABLE ( INFORMATION_SCHEMA . QUERY_HISTORY ())
WHERE start_time >= DATEADD ( hour , - 1 , CURRENT_TIMESTAMP ())
ORDER BY bytes_scanned DESC ;
Avoiding common pitfalls
SELECT * from large tables
Problem: Selecting all columns scans unnecessary dataSolution: Only select columns you need-- β Bad
SELECT * FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 7 ;
-- β
Good
SELECT
block_timestamp,
tx_hash,
from_address,
to_address
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 7 ;
Missing time filters on joins
Problem: Joining without time constraints scans entire tablesSolution: Add time filters to both sides of the join-- β
Good
SELECT t. * , s . amount_usd
FROM ethereum . core .fact_transactions t
JOIN ethereum . defi .ez_dex_swaps s
ON t . tx_hash = s . tx_hash
WHERE t . block_timestamp >= CURRENT_DATE - 7
AND s . block_timestamp >= CURRENT_DATE - 7 ;
Problem: Querying years of data when you only need recent dataSolution: Use explicit, narrow time ranges-- β Bad: Scans years of data
WHERE block_timestamp >= '2020-01-01'
-- β
Good: Scans only what you need
WHERE block_timestamp >= CURRENT_DATE - 30
No LIMIT on exploratory queries
Problem: Returning millions of rows accidentallySolution: Always use LIMIT when exploring-- β
Good for exploration
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp >= CURRENT_DATE - 1
LIMIT 1000 ;
Advanced optimization techniques
Using search optimization service
For frequent point lookups on specific addresses:
-- Enable search optimization (requires privileges)
ALTER TABLE ethereum . core .fact_transactions
ADD SEARCH OPTIMIZATION ON (from_address, to_address);
Contact Flipside if you need specific tables optimized for your use case.
Incremental processing patterns
For data pipelines, use incremental loads:
-- Track last processed timestamp
CREATE OR REPLACE TABLE my_checkpoint (
table_name VARCHAR ,
last_processed_timestamp TIMESTAMP_NTZ
);
-- Incremental query
SELECT *
FROM ethereum . core .fact_transactions
WHERE block_timestamp > (
SELECT last_processed_timestamp
FROM my_checkpoint
WHERE table_name = 'fact_transactions'
)
AND block_timestamp <= CURRENT_TIMESTAMP () - INTERVAL '10 minutes' -- Buffer for late-arriving data
ORDER BY block_timestamp;
Next steps