This is another challenging problem I had, and it took me some hours of work to find out how to solve it. I was very tempted to use a script in a procedural language like Perl, which would make the problem easy and the solution straightforward, but there had to be a pure SQL way and here’s how I did it!
Anytime a patient has to do some kind of (non urgent) surgery, we create a new event with all patient info on the main table
Event, then we log all updates to this event on the
- at first the patient is in it’s initial state: he/she has to provide documents, papers, previous results, insurance, allergies, etc.
- then the preparation phase can begin: he/she has to do some tests, talk to some doctors, wait for the results;
- when everything the preparation phase is completed, he/she’s ready and he has to wait for instructions;
- then the hospital will give an appointment, which can be updated (different room, different time) or changed (different day);
- finally he/she’s got the surgery and the event is considered completed:
There are a lot of useful informations than can be calculated here: how long does it take for a Ready event to be Completed? How long does it take the initial or the preparation phase? And what about the whole process?
The table is not normalized: it is optimized for entering data, not for querying it, that’s why the query isn’t simple.
Detect status changes
To detect status changes we can make use of the LAG window function:
LAG(status, 1, status) over (PARTITION BY event_id ORDER BY id)
that will return, for every row, the status value of the previous (1) row. If there’s no such row in the window partitioned by the
it will return the current status. Then we can compare the previous value with the current value, and return 1 if a change is detected, and 0 otherwise:
and the result is:
(it really doesn’t matter if the first row is set to 0 -no status change- or 1 -status change detected-, but what’s really important is that all other status changes are detected correctly with a 1).
Create a group for all consecutive rows with the same status
We can calculate a running sum:
sum(n) over (partition by event_id order by id) g
our query becomes:
can you see it? Every row that shares the same consecutive status is now part of the same group:
Get the first status change for each group
Now we can get the first (minimum) id for every group (g):
and here’s the result:
Back to Initial
We also have an additional requirement: sometimes it might happen that an event has to be sent back to the Initial status, so I want to ignore all things that happened previously, what is done is done, and only consider the last Initial status:
c4 will find the latest initial status, c5 will return all events after the last initial status.
Getting all the rows
The latest query C5 will return the event_id and the min_id of the rows to be considered:
and we can pivot the results with FILTER (if we have at least PostgreSQL 9.4)
A fiddle to play with some data is here.