partition
The partition
command splits a CSV file into multiple files based on a specified column.
Syntax
qsv partition <column> [<input_file>] [-o <output_directory>]
Description
The partition
command divides the input file (or stdin if no file is specified) into smaller files based on the unique values in the specified column(s). Each output file will contain only the rows where the column value matches the file name.
Options
-o, --output <directory>
: Specify the output directory for the partitioned files
Exit Codes
- 0: Partition successful
- Non-zero: An error occurred
Examples
Basic Partitioning
Partition the 'olympics2024.csv' file by the 'Country' column:
qsv partition Country olympics2024.csv
Output:
Created 80 files in the current directory
Specifying Output Directory
Partition the 'olympics2024.csv' file by the 'Country' column and output to a specific directory:
qsv partition Country olympics2024.csv -o partitioned_data
This will create a new directory called 'partitioned_data' and place the partitioned files inside it.
Partitioning by Multiple Columns
Partition the 'olympics2024.csv' file by the 'Country' and 'Event' columns:
qsv partition Country,Event olympics2024.csv -o partitioned_data
This will create separate files for each unique combination of Country and Event values.
Common Use Cases
- Breaking down large datasets into more manageable pieces
- Organizing data by categories for easier analysis
- Preparing data for parallel processing or distributed computing
Tips
- Use
partition
to split data into smaller files for more efficient processing - Combine
partition
with other qsv commands for more complex data workflows