SQL How To ...

Delete duplicate records


Delete Duplicate Records using SQL

By duplicate record I mean that every field in one record is identical to every field in a different record, i.e. a duplicate is where there is no way of telling two or more records apart. If you just need to remove records which are similar (i.e. one or more fields are identical but there are one or more fields which are different) then instead refer to how to delete similar records.

To check that you have duplicate records in your table do the following:

select count(*) from MyTable

and

select distinct * from MyTable

unfortunately SQL does not permit a select count(distinct). You have duplicate records if the number of records returned by the second query is less than the number of records returned by the first.

Unfortunately there is no way in SQL to delete one of these duplicates without deleting all of them. They are identical after all, so there is no SQL query that you could put together which could distinguish between them.

What you can do is to copy all the distinct records into a new table:

select distinct *
into NewTable
from MyTable

This query will create a new table (NewTable in my example) containing all the records in the original table but without any records being duplicated. It will therefore preserve a single copy of those records which were duplicated.

Because this query creates (and populates) a new table, it will fail if the table already exists.

Delete Similar Records

To delete similar records, i.e. where the records are not the same but one field is the same and only one copy needs to be preserved, try the following SQL:

delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField

This will delete all records from the table MyTable which have the same value for the field dupField, leaving that record which has the lowest value in uniqueField.

The above SQL works on SQL-Server and MySQL but not in MS-Access. For MS-Access try:

delete from MyTable 
where uniqueField not in (select min(uniqueField) from MyTable T2
where T2.dupField=MyTable.dupField)

This syntax also works in SQLServer (but not in MySQL) but may not be as efficient for large tables.



About the author: is a dedicated software developer and webmaster. For his day job he develops websites and desktop applications as well as providing IT services. He moonlights as a technical author and consultant.