Postgresql: Detect Status Changes in a Table
This is another challenging problem I had, and it took me some hours of work to find out how to solve it. I was very tempted to use a script in a procedural language like Perl, which would make the problem easy and the solution straightforward, but there had to be a pure SQL way and here’s how I did it!
The problem
Anytime a patient has to do some kind of (non urgent) surgery, we create a new event with all patient info on the main table
Event
, then we log all updates to this event on the Changes
table.
- at first the patient is in it’s initial state: he/she has to provide documents, papers, previous results, insurance, allergies, etc.
- then the preparation phase can begin: he/she has to do some tests, talk to some doctors, wait for the results;
- when everything the preparation phase is completed, he/she’s ready and he has to wait for instructions;
- then the hospital will give an appointment, which can be updated (different room, different time) or changed (different day);
- finally he/she’s got the surgery and the event is considered completed:
id | event_id | status | change_date | other |
---|---|---|---|---|
1 | 1 | Initial | 2017-10-07 | … |
2 | 1 | Initial | 2017-10-08 | … |
3 | 1 | Preparation | 2017-10-09 | … |
4 | 1 | Preparation | 2017-10-10 | … |
5 | 1 | Ready | 2017-10-11 | … |
6 | 1 | Appoinment-Given | 2017-10-12 | … |
7 | 1 | Appoinment-Given | 2017-10-13 | … |
8 | 1 | Appoinment-Given | 2017-10-14 | … |
9 | 1 | Completed | 2017-10-15 | … |
There are a lot of useful informations than can be calculated here: how long does it take for a Ready event to be Completed? How long does it take the initial or the preparation phase? And what about the whole process?
The table is not normalized: it is optimized for entering data, not for querying it, that’s why the query isn’t simple.
Detect status changes
To detect status changes we can make use of the LAG window function:
LAG(status, 1, status) over (PARTITION BY event_id ORDER BY id)
that will return, for every row, the status value of the previous (1) row. If there’s no such row in the window partitioned by the event_id
,
it will return the current status. Then we can compare the previous value with the current value, and return 1 if a change is detected, and 0 otherwise:
and the result is:
event_id | id | status | n |
---|---|---|---|
1 | 1 | Initial | 0 |
1 | 2 | Initial | 0 |
1 | 3 | Preparation | 1 |
1 | 4 | Preparation | 0 |
1 | 5 | Ready | 1 |
1 | 6 | Appoinment-Given | 1 |
1 | 7 | Appoinment-Given | 0 |
1 | 8 | Appoinment-Given | 0 |
1 | 9 | Done | 1 |
(it really doesn’t matter if the first row is set to 0 -no status change- or 1 -status change detected-, but what’s really important is that all other status changes are detected correctly with a 1).
Create a group for all consecutive rows with the same status
We can calculate a running sum:
sum(n) over (partition by event_id order by id) g
our query becomes:
can you see it? Every row that shares the same consecutive status is now part of the same group:
event_id | id | status | g |
---|---|---|---|
1 | 1 | Initial | 0 |
1 | 2 | Initial | 0 |
1 | 3 | Preparation | 1 |
1 | 4 | Preparation | 1 |
1 | 5 | Ready | 2 |
1 | 6 | Appoinment-Given | 3 |
1 | 7 | Appoinment-Given | 3 |
1 | 8 | Appoinment-Given | 3 |
1 | 9 | Done | 4 |
Get the first status change for each group
Now we can get the first (minimum) id for every group (g):
and here’s the result:
event_id | min_id | status |
---|---|---|
1 | 9 | Done |
1 | 6 | Appoinment-Given |
1 | 3 | Preparation |
1 | 5 | Ready |
1 | 1 | Initial |
Back to Initial
We also have an additional requirement: sometimes it might happen that an event has to be sent back to the Initial status, so I want to ignore all things that happened previously, what is done is done, and only consider the last Initial status:
c4 will find the latest initial status, c5 will return all events after the last initial status.
Getting all the rows
The latest query C5 will return the event_id and the min_id of the rows to be considered:
and we can pivot the results with FILTER (if we have at least PostgreSQL 9.4)
A fiddle to play with some data is here.