Google BigQuery is one of the most popular cloud-based data warehouses, widely used for large-scale analytics. As a Data Engineer, knowing the right commands not only saves time but also ensures efficient resource utilization and cost optimization.
Here are the Top 20 BigQuery commands with examples you should add to your toolkit.
SELECT schema_name
FROM `<project_id>.INFORMATION_SCHEMA.SCHEMATA`;
SELECT table_name
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.TABLES`;
SELECT table_name, row_count, size_bytes/1024/1024 AS size_mb, creation_time
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.TABLES`;
SELECT SUM(size_bytes)/1024/1024/1024 AS dataset_size_gb
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.TABLES`;
SELECT user_email, query, start_time, end_time, total_bytes_processed/1024/1024/1024 AS processed_gb
FROM `<project_id>.region-<location>.INFORMATION_SCHEMA.JOBS`
WHERE state='DONE'
ORDER BY start_time DESC
LIMIT 10;
SELECT query, (end_time - start_time) AS duration, total_bytes_processed/1024/1024/1024 AS processed_gb
FROM `<project_id>.region-<location>.INFORMATION_SCHEMA.JOBS`
WHERE state='DONE'
ORDER BY duration DESC
LIMIT 5;
SELECT query, cache_hit
FROM `<project_id>.region-<location>.INFORMATION_SCHEMA.JOBS`
WHERE state='DONE'
ORDER BY start_time DESC
LIMIT 10;
EXPLAIN
SELECT COUNT(*) FROM `<project_id>.<dataset>.<table>`;
EXPLAIN
SELECT * FROM `<project_id>.<dataset>.<table>` WHERE column_name='value';
SELECT * FROM `<project_id>.<dataset>.<table>` LIMIT 10;
SELECT partition_id, total_rows, total_bytes/1024/1024 AS size_mb
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name='your_table';
SELECT clustering_ordinal_position, column_name
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='your_table';
SELECT user_email, state, creation_time
FROM `<project_id>.region-<location>.INFORMATION_SCHEMA.SESSIONS`;
SELECT column1, COUNT(*)
FROM `<project_id>.<dataset>.<table>`
GROUP BY column1
HAVING COUNT(*) > 1;
SELECT COUNT(*)
FROM `<project_id>.<dataset>.<table>`
WHERE column_name IS NULL;
(BigQuery doesn’t support KILL directly, but you can stop jobs via the console or API.)
bq cancel -j <job_id>
CREATE OR REPLACE TABLE `<project_id>.<dataset>.<table>`
PARTITION BY DATE(timestamp_column)
CLUSTER BY column1, column2 AS
SELECT * FROM `<project_id>.<dataset>.<raw_table>`;
SELECT column_name, data_type
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='your_table';
SELECT grantee, privilege_type
FROM `<project_id>.<dataset>.INFORMATION_SCHEMA.SCHEMA_PRIVILEGES`;
SELECT user_email, COUNT(*) AS query_count, SUM(total_bytes_processed)/1024/1024/1024 AS total_gb
FROM `<project_id>.region-<location>.INFORMATION_SCHEMA.JOBS`
WHERE state='DONE'
GROUP BY user_email
ORDER BY total_gb DESC;
Conclusion:
BigQuery is powerful, but cost and performance optimization rely heavily on understanding how your queries work. These commands will give you visibility into your datasets, queries, and overall usage—helping you become a more efficient Data Engineer.