Ways to remove duplicate values

There are different methods for deleting duplicate (de-duplication) records from a table, each of them has its own pros and cons. I am going to discuss these methods, prerequisite of each of these methods along with its pros and cons.

  1.  Using correlated subquery 
  2. Using temporary table 
  3. Creating new table with distinct records and renaming it.. 
  4. Using Common Table Expression (CTE) 
  5. Using Fuzzy Group Transformation in SSIS 
  6. Using MERGE Statement 
Using Temporary Table

In this approach we pull distinct records from the target table into a temporary table, then truncate the target table and finally insert the records from the temporary table back to the target table



CREATE TABLE Employee

[FirstName] 
Varchar(100), 
[LastName] 
Varchar(100), 
[Address] 
Varchar(100), 

GO 
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Linda', 'Mitchel', 'America'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Linda', 'Mitchel', 'America'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('John', 'Albert', 'Australia'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('John', 'Albert', 'Australia'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('John', 'Albert', 'Australia'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Arshad', 'Ali', 'India'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Arshad', 'Ali', 'India'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Arshad', 'Ali', 'India'
INSERT
 INTO Employee([FirstName], [LastName], [Address]
VALUES 
('Arshad', 'Ali', 'India'
GO 
SELECT
 * FROM Employee 
GO

Script:

BEGIN TRAN
-- Pull distinct records in the temporary table 
SELECT DISTINCT * INTO #Employee FROM Employee --Truncate the target table 
TRUNCATE TABLE Employee --Insert the distinct records from temporary table 
--back to target table 
INSERT INTO Employee SELECT * FROM #Employee --Drop the temporary table 
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL 
DROP TABLE #Employee 
COMMIT
 TRAN
GO 
SELECT
 * FROM Employee 
GO




Creating new table with distinct records and renaming it
In this approach we create a new table with all distinct records, drop the existing target table and rename the newly created table with the original target table name. Please note, with this approach the meta-data about the target table will change for example object id, object creation date etc. so if you have any dependencies on these you have to take them into consideration.
Three things you need to aware of when you are using this approach.
  • First you need to make sure you have enough space in your database in the default filgroup (if you want your new table to be on some other file group than the default filegroup then you need to create a table first and then use INSERT INTO....SELECT * FROM) to hold all the distinct records especially if it is very large result-set.
  • Second you need to make sure you perform this operation in a transaction, at least the DROP and RENAME part so that you are not left with an another problem if it fails in between for any reason. 
  • Third you need to have required permissions for object creation/drop.
Script #4 - New table with distinct only
BEGIN TRAN
-- Pull distinct records in a new table 
SELECT DISTINCT * INTO EmployeeNew FROM Employee --Drop the old target table 
DROP TABLE Employee --rename the new table 
EXEC sp_rename 'EmployeeNew', 'Employee' 
COMMIT
 TRAN
GO 
SELECT
 * FROM Employee 
GO
























1 comment:

  1. Sands Casino | Situs Judi Slot Online Terpercaya, Judi Online
    Situs Judi septcasino Slot Online Terpercaya 2021 & 2021 — Dengan pelayanan game judi online terbaik, juga live casino online terlengkap 메리트 카지노 seperti judi bola sbobet, kadangpintar live casino online

    ReplyDelete

What is Normalization?

Database normalization is a data design and organization process applied to data structures based on rules that help building relational dat...