Postgresql: Detect Status Changes in a Table
This is another challenging problem I had, and it took me some hours of work to find out how to solve it. I was very tempted to use a script in a procedural language like Perl, which would make the problem easy and the solution straightforward, but there had to be a pure SQL way and here’s how I did it!
Anytime a patient has to do some kind of (non urgent) surgery, we create a new event with all patient info on the main table
Event, then we log all updates to this event on the
- at first the patient is in it’s initial state: he/she has to provide documents, papers, previous results, insurance, allergies, etc.
- then the preparation phase can begin: he/she has to do some tests, talk to some doctors, wait for the results;
- when everything the preparation phase is completed, he/she’s ready and he has to wait for instructions;
- then the hospital will give an appointment, which can be updated (different room, different time) or changed (different day);
- finally he/she’s got the surgery and the event is considered completed:
There are a lot of useful informations than can be calculated here: how long does it take for a Ready event to be Completed? How long does it take the initial or the preparation phase? And what about the whole process?
The table is not normalized: it is optimized for entering data, not for querying it, that’s why the query isn’t simple.
Detect status changes
To detect status changes we can make use of the LAG window function:
LAG(status, 1, status) over (PARTITION BY event_id ORDER BY id)
that will return, for every row, the status value of the previous (1) row. If there’s no such row in the window partitioned by the
it will return the current status. Then we can compare the previous value with the current value, and return 1 if a change is detected, and 0 otherwise:
with c1_detect_changes as ( select event_id, id, status, case when LAG(status, 1, status) OVER(PARTITION BY event_id ORDER BY id) = status then 0 else 1 end as n from changes order by event_id, id ) select * from c1_detect_changes
and the result is:
(it really doesn’t matter if the first row is set to 0 -no status change- or 1 -status change detected-, but what’s really important is that all other status changes are detected correctly with a 1).
Create a group for all consecutive rows with the same status
We can calculate a running sum:
sum(n) over (partition by event_id order by id) g
our query becomes:
with c2_running_sum as ( select event_id, id, status, sum(n) over (partition by event_id order by id) g from c1_detect_changes ) select * from c2_running_sum;
can you see it? Every row that shares the same consecutive status is now part of the same group:
Get the first status change for each group
Now we can get the first (minimum) id for every group (g):
whti c3_get_first_id_per_group as ( select event_id, g, status, min(id) as min_id from c2_running_sum group by event_id, g, status ) select * from c3_get_first_id_per_group;
and here’s the result:
Back to Initial
We also have an additional requirement: sometimes it might happen that an event has to be sent back to the Initial status, so I want to ignore all things that happened previously, what is done is done, and only consider the last Initial status:
with c4_get_last_initial as ( select event_id, max(min_id) max_initial from c3_get_first_id_per_group where status='Initial' group by event_id ), c5_latest_status_changes as ( select c3.event_id, min_id from c3_get_first_id_per_group as c3 inner join c4_get_last_initial as c4 on c3.event_id=c4.event_id where c3.min_id >= c4.max_initial )
c4 will find the latest initial status, c5 will return all events after the last initial status.
Getting all the rows
The latest query C5 will return the event_id and the min_id of the rows to be considered:
select changes.* from changes where (event_id, id) in (select * from c5_latest_status_changes)
and we can pivot the results with FILTER (if we have at least PostgreSQL 9.4)
select event_id, max(change_date) filter (where status='Initial') as initial, max(change_date) filter (where status='Preparation') as preparation, max(change_date) filter (where status='Ready') as ready, max(change_date) filter (where status='Appointment-Given') as app_given, max(change_date) filter (where status='Done') as done from changes where (event_id, id) in (select * from c5) group by event_id
A fiddle to play with some data is here.