SQL - Tips
28 Nov 2017- (how to) specify condition in COUNT
- [PostgreSQL] (how to) count occurrences of specific element in array column
- [PostgreSQL] (how to) expand multiple array columns to a set of rows
- [PostgreSQL] (how to) to find the first/last rows within a group
- [PostgreSQL] (how to) cast JSONB object fields
(how to) specify condition in COUNT
SELECT COUNT(CASE WHEN is_admin IS TRUE THEN 1 END) AS admin_count FROM users;
SELECT COUNT(CASE WHEN is_admin IS TRUE THEN 1 ELSE NULL END) AS admin_count FROM users;
NOTE: don’t specify ELSE 0
in condition - all rows will be counted then!
[PostgreSQL] (how to) count occurrences of specific element in array column
count occurrences of player id 2503 in all rows (in all games):
=# select * from game_stats;
id | game_id | player_ids |
----+---------+-------------------+
1 | 17233 | {} |
2 | 17235 | {} |
3 | 17210 | {2503} |
4 | 17213 | {2502,2503,2503} |
SELECT
unnested_game_stats.game_id AS game_id,
COUNT(*) AS player_count
FROM (
SELECT
game_stats.game_id AS game_id,
UNNEST(game_stats.player_ids) AS player_id
FROM game_stats
) unnested_game_stats
WHERE unnested_game_stats.player_id = 2503
[PostgreSQL] (how to) expand multiple array columns to a set of rows
- https://www.postgresql.org/docs/9.5/static/functions-array.html
- http://www.sql.ru/forum/1112093/funkciya-unnest
- https://www.depesz.com/2013/07/30/waiting-for-9-4-add-sql-standard-with-ordinality-support-for-unnest-and-any-other-srf/
- https://www.postgresql.org/message-id/52124690.8050309%40cybertec.at
calculate total penalty time for player id 2503 in all rows (in all games):
=# select * from game_stats;
id | game_id | penalized_player_ids | penalty_times
-----+---------+----------------------+---------------
908 | 40885 | {2503, 2503} | {6,8}
907 | 40882 | {2502, 2503} | {2,4}
this
email from pgsql-hackers
mailing list contains the only (!) example of using
UNNEST
function with table columns instead of arrays I’ve managed to find on
the Internet.
to expand multiple array columns to a set of rows so that the 1st row contains the 1st elements of all array columns, the 2nd row contains the 2nd elements of all array columns, etc. you’ve got to:
- select table first
- pass its array column names to
UNNEST
function - alias new columns if you want to use them in
SELECT
clause
SELECT
unnested_game_stats.game_id AS game_id,
COALESCE(CAST(SUM(unnested_game_stats.penalty_time) AS INTEGER), 0)
AS penalty_time_game_sum
FROM (
SELECT
game_stats.game_id AS game_id,
penalties.penalized_player_id AS penalized_player_id,
penalties.penalty_time AS penalty_time
FROM game_stats, UNNEST(
game_stats.penalized_player_ids,
game_stats.penalty_times
) AS penalties(penalized_player_id, penalty_time)
) unnested_game_stats
WHERE unnested_game_stats.penalized_player_id = 2503
here we select from table and a set of rows returned by UNNEST
function
(that is passed this table array column names) - but the result is not a
Cartesian product (cross join) of all sources: each table row is properly
joined with its own unnested values of array columns.
should you replace array column names with arrays, you’ll get a typical cross join:
SELECT game_stats.id, first, second
FROM game_stats, UNNEST(array[1, 2], array[3, 4]) AS t(first, second)
you can get the same result by adding 2 array columns to the table and
updating ALL records with {1,2}
and {3,4}
values in these columns.
[PostgreSQL] (how to) to find the first/last rows within a group
for example, this is how to find creation date of the youngest post for each user:
-
using
GROUP BY user_id
+MIN(created_at)
SELECT users.*, p.created_at FROM users INNER JOIN ( SELECT user_id, MIN(created_at) AS created_at FROM posts GROUP BY user_id ) AS p ON p.user_id = users.id
use
MAX(created_at)
to find creation date of the oldest post. -
using
ORDER BY user_id, created_at ASC
+DISTINCT ON (user_id)
- https://zaiste.net/postgresql_distinct_on
- https://stackoverflow.com/questions/15206809/activerecord-select-first-record-from-each-group
SELECT users.*, p.created_at FROM users INNER JOIN ( SELECT DISTINCT ON (user_id) user_id, created_at FROM posts ORDER BY user_id, created_at ASC ) AS p ON p.user_id = users.id
order by
created_at DESC
to find creation date of the oldest post.advantage of
DISTINCT_ON
is that it allows to select any columns in the first row of the group while withGROUP BY
you’re limited to aggregates and the column which appears inGROUP BY
clause only.say, we can select title of the youngest post as well:
SELECT users.*, p.title, p.created_at FROM users INNER JOIN ( SELECT DISTINCT ON (user_id) user_id, title, created_at FROM posts ORDER BY user_id, created_at ASC ) AS p ON p.user_id = users.id
-
using
OVER (PARTITION BY user_id ORDER BY created_at ASC)
+ROW_NUMBER()
- https://www.postgresql.org/docs/9.1/tutorial-window.html
- http://www.postgresqltutorial.com/postgresql-window-function
- https://zaiste.net/postgresql_distinct_on
SELECT users.*, p.name, p.created_at INNER JOIN ( SELECT user_id, created_at, ROW_NUMBER() AS rn OVER (PARTITION BY user_id ORDER BY created_at ASC) FROM posts ) AS p ON p.user_id = users.id AND p.rn = 1
using
ROW_NUMBER()
window function allows to find not only the first or the last row but any row in between too (say, the post created second).just like with
DISTINCT_ON
clause it’s possible to select any columns in requested row.ROW_NUMBER()
vs.RANK()
:https://stackoverflow.com/questions/7747327/sql-rank-versus-row-number
ROW_NUMBER: Returns a unique number for each row starting with 1. For rows that have duplicate values, numbers are arbitarily assigned.
RANK: Assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.
[PostgreSQL] (how to) cast JSONB object fields
it’s not allowed to use non-text JSONB objects fields directly without casting:
postgres=# SELECT data->'item_count' > 10 FROM shops;
ERROR: operator does not exist: jsonb > integer
=> cast these fields, say, before using in WHERE
clauses:
-
get JSONB object field as text and cast
this method always works in all versions of PostgreSQL:
SELECT (data->>'item_count')::INTEGER > 10 FROM shops; SELECT CAST(data->>'item_count' AS INTEGER) > 10 FROM shops;
-
get JSONB object field by key and cast
only PostgreSQL 11 allows to cast JSONB scalar values to numeric and boolean data types:
SELECT (data->'item_count')::INTEGER > 10 FROM shops; SELECT CAST(data->'item_count' AS INTEGER) > 10 FROM shops;
in earlier versions of PostgreSQL you would get this error:
postgres=# SELECT (data->'item_count')::INTEGER > 10 FROM shops; ERROR: cannot cast type jsonb to integer