Use Common Table Expressions (WITH
clauses; like a let for SQL) combined with
the row_number()
window function to pick out the biggest (or first or
smallest) row within a grouping. In the following example, we want to pick the
student with the top grade per class:
$sqlite3
sqlite> create table t(class string, name string, grade int);
sqlite> insert into t values ("Math", "Alice", 99),
("Math", "Bob", 87),
("Science", "Charlie", 88),
("Science", "David", 93);
sqlite> select * from t;
class name grade
---------- ---------- ----------
Math Alice 99
Math Bob 87 <-- we want this one
Science Charlie 88
Science David 93 <-- and this one
sqlite> WITH ranked AS (
SELECT
*,
row_number() OVER (PARTITION BY class
ORDER BY grade DESC) AS r
FROM t)
SELECT * FROM ranked WHERE r=1;
class name grade r
---------- ---------- ---------- ----------
Math Alice 99 1
Science David 93 1
This creates a temporary that's just like our original data, but has an extra
column called "r" which is the row_number, partitioned by class, and ordered by
grade (descending). Picking out the student with the top grade is picking out
the rows where r=1
.
This trick can be applied to get the slowest request per user, sample one request per day, and so forth.
Before I knew about this trick, I'd join against an aggregation, but that had a tendency to pick out extra rows for duplicates and was (more) painful to write.
If you're using DuckDB, there's a QUALIFY clause that makes it even easier:
$duckdb
v0.8.1 6536a77232
D create table t(class string, name string, grade int);
D insert into t values ('Math', 'Alice', 99);
D insert into t values ('Math', 'Bob', 87);
D insert into t values ('Science', 'Charlie', 88);
D insert into t values ('Science', 'David', 93);
D SELECT * FROM t QUALIFY row_number() OVER (PARTITION BY class ORDER BY grade DESC);
┌─────────┬─────────┬───────┐
│ class │ name │ grade │
│ varchar │ varchar │ int32 │
├─────────┼─────────┼───────┤
│ Science │ David │ 93 │
│ Science │ Charlie │ 88 │
│ Math │ Alice │ 99 │
│ Math │ Bob │ 87 │
└─────────┴─────────┴───────┘
D SELECT * FROM t QUALIFY row_number() OVER (PARTITION BY class ORDER BY grade DESC) = 1;
┌─────────┬─────────┬───────┐
│ class │ name │ grade │
│ varchar │ varchar │ int32 │
├─────────┼─────────┼───────┤
│ Science │ David │ 93 │
│ Math │ Alice │ 99 │
└─────────┴─────────┴───────┘
Originally posted at https://collectednotes.com/philz/handy-window-function-trick.