I've 250 million row citizen data with field, name, gender, birth date, birth place, could u share best sql query to get potencial data duplication?

Question

SQL, Oracle, Query, Script

Wameedh Fawzi · Answer

select t1.*

from

(Select

a.name

,a.gender

,a.birth_date

,a.birth_place

,row_number() over (partition by a.name,a.gender, a.birth_date order by a.name) as rnk

from Citizen_data a

) t1

where t1.rnk >1

Tahir Rehman · Answer

select name,gender, birthdate,birthplace,count(*) from Citizen group by name,gender, birthdate,birthplace having count(*) > 1;

Kasiviswanathan Selvakumar · Answer

Let us consider the table name be 'citizen_table' with 'name', 'gender', 'birth_date' and 'birth_place' as column names

The following sql query gets the potential data duplicates

select count(*) as number_of_repeated_rows, name, gender, birth_date, birth_place

from citizen_table

group by name, gender, birth_date, birth_place