Skip to main content

partition

The partition command splits a CSV file into multiple files based on a specified column.

Syntax

qsv partition <column> [<input_file>] [-o <output_directory>]

Description

The partition command divides the input file (or stdin if no file is specified) into smaller files based on the unique values in the specified column(s). Each output file will contain only the rows where the column value matches the file name.

Options

  • -o, --output <directory>: Specify the output directory for the partitioned files

Exit Codes

  • 0: Partition successful
  • Non-zero: An error occurred

Examples

Basic Partitioning

Partition the 'olympics2024.csv' file by the 'Country' column:

qsv partition Country olympics2024.csv

Output:

Created 80 files in the current directory

Specifying Output Directory

Partition the 'olympics2024.csv' file by the 'Country' column and output to a specific directory:

qsv partition Country olympics2024.csv -o partitioned_data

This will create a new directory called 'partitioned_data' and place the partitioned files inside it.

Partitioning by Multiple Columns

Partition the 'olympics2024.csv' file by the 'Country' and 'Event' columns:

qsv partition Country,Event olympics2024.csv -o partitioned_data

This will create separate files for each unique combination of Country and Event values.

Common Use Cases

  • Breaking down large datasets into more manageable pieces
  • Organizing data by categories for easier analysis
  • Preparing data for parallel processing or distributed computing

Tips

  • Use partition to split data into smaller files for more efficient processing
  • Combine partition with other qsv commands for more complex data workflows

See Also

  • split - for splitting files based on size or number of lines
  • join - for combining data from multiple files