Interview Prep/SQL

Top 50 SQL Interview Questions & Answers 2025

Master SQL interviews with 50+ questions on queries, joins, optimization, and database design patterns.

19 Questions~30 min read8 CategoriesUpdated 2025
Practice SQL Quiz

Joins

01 · 2q

INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matched rows from right (NULL for non-matches). RIGHT JOIN returns all rows from right table plus matched rows from left. FULL OUTER JOIN returns all rows from both tables. Choose based on which records you need to preserve.

INNER JOIN: only matching rows from both tables. LEFT/RIGHT OUTER JOIN: all rows from one side, NULLs for non-matches. FULL OUTER JOIN: all rows from both, NULLs where no match. CROSS JOIN: cartesian product (every combination). SELF JOIN: table joined to itself (hierarchies). ANTI JOIN (NOT EXISTS / NOT IN): rows in left with no match in right. Choose based on which rows you need.

Filtering

02 · 1q

WHERE filters rows before grouping (works on individual rows). HAVING filters groups after GROUP BY (works on aggregated results). Use WHERE for row-level conditions, HAVING for conditions on aggregates like COUNT, SUM, AVG. Example: WHERE salary > 50000 (row filter) vs HAVING COUNT(*) > 5 (group filter).

Optimization

03 · 6q

Indexes are data structures that speed up data retrieval by creating sorted references to table rows. Use indexes on: columns in WHERE clauses, JOIN conditions, ORDER BY columns, columns with high selectivity. Avoid over-indexing as they slow writes and use storage. Common types: B-tree (default), Hash, GiST, GIN.

Steps: (1) Use EXPLAIN/EXPLAIN ANALYZE to understand query plan, (2) Add appropriate indexes, (3) Avoid SELECT * - select only needed columns, (4) Optimize JOINs and subqueries, (5) Use LIMIT for large results, (6) Consider query rewriting, (7) Check for missing statistics, (8) Consider partitioning large tables, (9) Use connection pooling.

Clustered index determines physical order of data in the table - only one per table (usually primary key). Non-clustered indexes are separate structures pointing to data rows - multiple allowed. Clustered is faster for range queries; non-clustered is faster when you need specific rows. Choose clustered index based on most common access patterns.

EXPLAIN shows query execution plan without running it. EXPLAIN ANALYZE runs the query and shows actual vs estimated times. Key nodes: Seq Scan (full table scan — check if index needed), Index Scan, Nested Loop / Hash Join / Merge Join. Look for: high rows estimates vs actual, large sort operations, repeated scans. Use to identify missing indexes and inefficient joins.

OFFSET/LIMIT: simple but slow for large pages (must scan all preceding rows). Cursor-based (keyset) pagination: WHERE id > lastId ORDER BY id LIMIT n — constant time, better for large datasets. Use cursor pagination for APIs with sequential access. OFFSET pagination acceptable for small tables or early pages. Always add ORDER BY for deterministic results.

Partitioning divides a table into smaller pieces stored separately. Types: Range (by date ranges), Hash (distribute evenly), List (by specific values). Benefits: query performance (partition pruning), easier archiving, parallel processing. PostgreSQL: declarative partitioning. Trade-offs: complexity, cross-partition queries, foreign key limitations. Shard when partitioning isn't enough.

Transactions

04 · 3q

ACID ensures reliable transactions: Atomicity (all or nothing - transaction fully completes or fully rolls back), Consistency (database moves from one valid state to another), Isolation (concurrent transactions don't interfere), Durability (committed transactions survive failures). Critical for financial systems, inventory management, and data integrity.

A deadlock occurs when two transactions each hold locks the other needs, causing both to wait forever. Prevention: access tables in consistent order, keep transactions short, use lower isolation levels when possible, use lock timeouts, avoid user interaction during transactions. Detection: databases automatically detect and roll back one transaction.

Isolation levels control visibility of concurrent transaction changes: READ UNCOMMITTED (dirty reads), READ COMMITTED (default in most DBs), REPEATABLE READ (no non-repeatable reads), SERIALIZABLE (full isolation). Problems: dirty read, non-repeatable read, phantom read. Higher isolation = fewer anomalies but more locking/contention. PostgreSQL uses MVCC to avoid most locking.

Design

05 · 3q

1NF: Atomic values, no repeating groups. 2NF: 1NF + no partial dependencies (non-key attributes depend on entire primary key). 3NF: 2NF + no transitive dependencies (non-key attributes depend only on primary key). BCNF: Every determinant is a candidate key. Normalization reduces redundancy; sometimes denormalize for read performance.

View: virtual table based on a SELECT query — no stored data, always current. Materialized view: stores query results physically, refreshed periodically (REFRESH MATERIALIZED VIEW). Use views for: security (expose subset of columns), simplifying complex queries. Use materialized views for: expensive aggregations queried frequently, when slightly stale data is acceptable.

Referential integrity ensures relationships between tables remain consistent. Foreign keys enforce this: a FK column must match a PK in the referenced table or be NULL. ON DELETE/UPDATE actions: CASCADE (propagate change), RESTRICT (block change), SET NULL, SET DEFAULT. Cascades are powerful but dangerous — understand the chain before using. Index FK columns for performance.

Set Operations

06 · 1q

UNION combines result sets and removes duplicates (requires sorting/hashing). UNION ALL combines result sets keeping all rows including duplicates (faster). Use UNION when you need unique results; use UNION ALL when duplicates are acceptable or impossible, for better performance.

Advanced Queries

07 · 2q

Window functions perform calculations across related rows without collapsing them (unlike GROUP BY). Syntax: function() OVER (PARTITION BY col ORDER BY col). Examples: ROW_NUMBER() for ranking, LAG/LEAD for previous/next values, SUM() OVER for running totals, RANK/DENSE_RANK for rankings with ties. Essential for analytics and reporting.

CTE (WITH clause) defines a named temporary result set used within a query. Benefits: readability, reuse within query, replace subqueries. Recursive CTE references itself to traverse hierarchical data (org charts, file trees). Structure: anchor member UNION ALL recursive member. Add termination condition (depth limit or WHERE clause) to prevent infinite loops.

Programmability

08 · 1q

Stored procedure: can have side effects, multiple result sets, no return value required, called with EXEC/CALL. Function: returns a value, no side effects (in most databases), can be used in SELECT. Functions enable use in queries (WHERE, JOIN). Stored procedures better for complex business logic, batch operations. Functions for reusable calculations in queries.

Ready to test your SQL skills?

Practice with interactive quizzes and get instant feedback.

Start Free Practice