dedup
The dedup
command removes duplicate rows from a CSV file.
Syntax
qsv dedup [options] <input_file> [<output_file>]
Description
The dedup
command is used to remove duplicate rows from a CSV file. This is useful for cleaning datasets and ensuring data integrity.
Options
--no-headers
: When set, the first row will not be interpreted as headers
Examples
Remove Duplicate Rows
Remove duplicate rows from a CSV file:
qsv dedup DLD_Transactions_English_500.csv | qsv table
Output:
transaction_id trans_group_en procedure_name_en instance_date property_type_en property_sub_type_en property_usage_en reg_type_en area_name_en building_name_en project_number project_name_en master_project_en nearest_landmark_en nearest_metro_en nearest_mall_en rooms_en has_parking procedure_area actual_worth meter_sale_price rent_value meter_rent_price no_of_parties_role_1 no_of_parties_role_2 no_of_parties_role_3
1-102-2022-29434 Sales Sell - Pre registration 28-09-2022 Unit Hotel Rooms Hospitality Off-Plan Properties Al Barsha South Fifth TERHAB HOTEL & TOWERS - TOWER 3 1722 TERHAB HOTEL & TOWERS AT JUMEIRAH VILLAGE TRIANGLE Jumeirah Village Triangle Sports City Swimming Academy Damac Properties Marina Mall Studio 1 34.35 555379.00 16168.24 null null 2 1 0
1-11-2024-19676 Sales Sell 31-05-2024 Unit Flat Residential Existing Properties Marsa Dubai MARINA STAR 551 MARINA STAR Dubai Marina Sports City Swimming Academy Jumeirah Beach Residency Marina Mall 2 B/R 1 161.71 3481260.00 21527.80 null null 1 1 0
...
Common Use Cases
- Cleaning datasets
- Ensuring data integrity
Tips
- Verify the output to ensure duplicates have been removed
- Use in combination with other commands for complex data processing