Finding duplicate values in a SQL table

ID : 381

viewed : 320

Tags : sqlduplicatessql





Top 5 Answer for Finding duplicate values in a SQL table

vote vote

99

SELECT     name, email, COUNT(*) FROM     users GROUP BY     name, email HAVING      COUNT(*) > 1 

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

vote vote

84

try this:

declare @YourTable table (id int, name varchar(10), email varchar(50))  INSERT @YourTable VALUES (1,'John','John-email') INSERT @YourTable VALUES (2,'John','John-email') INSERT @YourTable VALUES (3,'fred','John-email') INSERT @YourTable VALUES (4,'fred','fred-email') INSERT @YourTable VALUES (5,'sam','sam-email') INSERT @YourTable VALUES (6,'sam','sam-email')  SELECT     name,email, COUNT(*) AS CountOf     FROM @YourTable     GROUP BY name,email     HAVING COUNT(*)>1 

OUTPUT:

name       email       CountOf ---------- ----------- ----------- John       John-email  2 sam        sam-email   2  (2 row(s) affected) 

if you want the IDs of the dups use this:

SELECT     y.id,y.name,y.email     FROM @YourTable y         INNER JOIN (SELECT                         name,email, COUNT(*) AS CountOf                         FROM @YourTable                         GROUP BY name,email                         HAVING COUNT(*)>1                     ) dt ON y.name=dt.name AND y.email=dt.email 

OUTPUT:

id          name       email ----------- ---------- ------------ 1           John       John-email 2           John       John-email 5           sam        sam-email 6           sam        sam-email  (4 row(s) affected) 

to delete the duplicates try:

DELETE d     FROM @YourTable d         INNER JOIN (SELECT                         y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank                         FROM @YourTable y                             INNER JOIN (SELECT                                             name,email, COUNT(*) AS CountOf                                             FROM @YourTable                                             GROUP BY name,email                                             HAVING COUNT(*)>1                                         ) dt ON y.name=dt.name AND y.email=dt.email                    ) dt2 ON d.id=dt2.id         WHERE dt2.RowRank!=1 SELECT * FROM @YourTable 

OUTPUT:

id          name       email ----------- ---------- -------------- 1           John       John-email 3           fred       John-email 4           fred       fred-email 5           sam        sam-email  (4 row(s) affected) 
vote vote

76

Try this:

SELECT name, email FROM users GROUP BY name, email HAVING ( COUNT(*) > 1 ) 
vote vote

61

If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

SELECT id, name, email  FROM users u, users u2 WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id 

And so to delete:

DELETE FROM users WHERE id IN (     SELECT id/*, name, email*/     FROM users u, users u2     WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id ) 

Much more easier to read and understand IMHO

Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

vote vote

55

In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY part of row_number function choose the desired unique/duplicit columns.

SELECT  * FROM    (  SELECT a.*  ,      Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r  FROM   Customers AS a )       AS b WHERE   r > 1; 

When you want to select ALL duplicated records with ALL fields you can write it like

CREATE TABLE test (         id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY ,       c1      integer ,       c2      text ,       d       date DEFAULT now() ,       v       text );  INSERT INTO test (c1, c2, v) VALUES (1, 'a', 'Select'), (1, 'a', 'ALL'), (1, 'a', 'multiple'), (1, 'a', 'records'), (2, 'b', 'in columns'), (2, 'b', 'c1 and c2'), (3, 'c', '.'); SELECT * FROM test ORDER BY 1;  SELECT  * FROM    test WHERE   (c1, c2) IN (  SELECT c1, c2  FROM   test  GROUP  BY 1,2  HAVING count(*) > 1 ) ORDER   BY 1; 

Tested in PostgreSQL.

Top 3 video Explaining Finding duplicate values in a SQL table







Related QUESTION?