Help with select and repeated records

Posted on

Question :

I have a postgresql database with a table like this:

id_usuario | dt_atualizacao | atualizacao | outros_dados
123        | 01-01-2001     | 0           | abc
123        | 01-01-2005     | 1           | abe
123        | 01-01-2012     | 1           | acd
123        | 01-01-2018     | 1           | acc
124        | 01-01-2017     | 0           | acj
124        | 01-01-2018     | 1           | agy
125        | 01-01-2018     | 0           | awe

What happens is this: In this application, when a user is registered for the first time, he leaves the field atualizacao = 0 and sets the current date.

But when an update is made to this register (say I changed the field outros_dados ), instead of updating the current record, it creates a new record with a new date and field atualização = 1 .

What I need is a select that takes all user ids, without repeating it and being the most current one. In the example above, I would need the following result:

id_usuario | dt_atualizacao | atualizacao | outros_dados
123        | 01-01-2018     | 1           | acc
124        | 01-01-2018     | 1           | agy
125        | 01-01-2018     | 0           | awe

Any tips on how to do this?


Answer :

The table structure is not exactly “great”.

If you want to have an update history then you should, as well as @Dudaskank , is to change the data type from DATE to TIMESTAMP to be able to store the time, minute and second of the update and thus be able to pick up the last update in a simpler and cleaner way.

The best way to get the latest updates, given the change in the dt_atualizacao column type, is to use the DISTINCT ON :

SELECT      DISTINCT ON (id_usuario) id_usuario
        ,   dt_atualizacao
        ,   outros_dados
FROM        historico
ORDER BY    id_usuario
        ,   dt_atualizacao DESC

The Historico table will be your table with the updates.


Some points to note:

  • Instead of using a date field, which seems to save only the date in this example, use timestamp or similar, so if there is more than one update on the same date, you can differentiate between them which is the most recent.
  • I believe you do not need this atualizacao column. Just get the highest date for given id to know the most recent.
  • Update 8/29/2018 1:49 PM

    As the author has reported that only one update per day is what happens in practice, even using date will work without problems.

    I’ve also created a SQL Fiddle with the example data, both with my solution and colleague’s solution @ João Martins . For some reason, which I do not understand much to speak the truth, my query ended up being faster, so the tip is (but his is much easier to read by a human, if you do not need more speed is also another tip) .

    For the query, try something like this, where historico is the name of the table and dt_atualizacao the field with the date and time of that record:

    SELECT * FROM historico a WHERE dt_atualizacao=(SELECT MAX(dt_atualizacao) FROM historico b WHERE b.id_usuario=a.id_usuario);


    Leave a Reply

    Your email address will not be published. Required fields are marked *