select
The select
command manipulates columns in CSV data, allowing you to reorder, duplicate, reverse, or drop them. Columns can be referenced by index or by name if there is a header row, with duplicate column names disambiguated using indexing. Column ranges can also be specified, and columns can be selected using regular expressions.
Syntax
qsv select [options] [--] <selection> [<input_file>] [<output_file>]
Description
The select
command offers a wide range of column manipulation capabilities, including reordering, duplicating, reversing, and dropping columns. It supports selection by index, name, range, and regular expressions, making it a powerful tool for data processing.
Options
-R, --random
: Randomly shuffle the columns in the selection.--seed <number>
: Seed for the random number generator.-S, --sort
: Sort the selected columns lexicographically, i.e., by their byte values.-h, --help
: Display this message.-o, --output <file>
: Write output to<file>
instead of stdout.-n, --no-headers
: When set, the first row will not be interpreted as headers.-d, --delimiter <arg>
: The field delimiter for reading CSV data. Must be a single character. (default: ,)
Examples
Basic Selection
Select the first and fourth columns:
qsv select 1,4
Selecting Columns by Index and Name
Select the first 4 columns by index and by name:
qsv select 1-4
qsv select Header1-Header4
Ignoring Columns
Ignore the first 2 columns by range and by omission:
qsv select 3-
qsv select '!1-2'
Selecting Columns by Name
Select the third column named 'Foo':
qsv select 'Foo[2]'
Selecting First and Last Columns
Select the first and last columns, using _
as a special character for the last column:
qsv select 1,_
Reversing Column Order
Reverse the order of columns:
qsv select _-1
Sorting Columns
Sort the columns lexicographically:
qsv select 1- --sort
Selecting and Sorting Columns
Select some columns and then sort them:
qsv select 1,4,5-7 --sort
Shuffling Columns
Randomly shuffle the columns:
qsv select 1- --random
Shuffling Columns with a Seed
Randomly shuffle the columns with a seed:
qsv select 1- --random --seed 42
Selecting Columns with a Regex
Select columns using a regex:
# select columns starting with 'a'
qsv select /^a/
# select columns with a digit
qsv select '/^.*\d.*$/'
# remove SSN, account_no and password columns
qsv select '!/SSN|account_no|password/'
Reordering and Duplicating Columns
Reorder and duplicate columns arbitrarily using different types of selectors:
qsv select 3-1,Header3-Header1,Header1,Foo[2],Header1
Quoting Column Names
Quote column names that conflict with selector syntax:
qsv select '\"Date - Opening\",\"Date - Actual Closing\"'
Common Use Cases
- Focusing on specific data columns for analysis or reporting
- Reducing dataset size by excluding unnecessary columns
- Preparing data for import into systems with specific column requirements
- Data cleaning and preprocessing
Tips
- Use
select
in combination with other qsv commands for more complex data processing - Verify the selected columns by inspecting the output