Skip to main content

apply

The apply command applies a series of string, format, currency, regex, and NLP transformations to given CSV columns.

Syntax

qsv apply operations [options] <column> <operation> [<input_file>] [<output_file>]

Description

The apply command is a powerful tool for data transformation. It can perform various operations on specified columns, including string manipulations, format conversions, numerical calculations, and natural language processing tasks.

Options

  • -c, --new-column <name>: Put the transformed values in a new column instead
  • -r, --rename <name>: New name for the transformed column
  • -C, --comparand=<string>: The string to compare against for replace and similarity operations
  • -R, --replacement=<string>: The string to use for replace operations
  • -f, --formatstr=<string>: This option is used by several subcommands

String Operations

Common Combinations

qsv apply operations lower trans_group_en -c trans_group_en_lower DLD_Transactions_English_500.csv \
| qsv apply operations upper project_name_en -c project_name_en_upper \
| qsv select 'trans_group_en_lower,project_name_en_upper' \
| qsv sample 10 \
| qsv table

Lower

The lower operation converts all characters in the specified column to lowercase. This is helpful for standardizing text data, especially when case sensitivity is not important.

Apply lower on the trans_group_en column:

qsv apply operations lower trans_group_en -c trans_group_en_lower DLD_Transactions_English_500.csv \
| qsv select 'trans_group_en,trans_group_en_lower' \
| qsv sample 10 \
| qsv table
Output:
trans_group_en  trans_group_en_lower
Sales sales
Sales sales
Sales sales
Sales sales
Sales sales
Mortgages mortgages
Mortgages mortgages
Mortgages mortgages
Sales sales
Mortgages mortgages

Upper

The upper operation converts all characters in the specified column to uppercase. This is useful for ensuring consistency in text data, such as converting names or codes to a uniform format.

Apply upper on the area_name_en column:

qsv apply operations upper trans_group_en -c trans_group_en_upper DLD_Transactions_English_500.csv \
| qsv select 'trans_group_en,trans_group_en_upper' \
| qsv sample 10 \
| qsv table
Output:
trans_group_en  trans_group_en_upper
Sales SALES
Sales SALES
Sales SALES
Mortgages MORTGAGES
Mortgages MORTGAGES
Sales SALES
Sales SALES
Sales SALES
Sales SALES
Sales SALES

Titlecase

The titlecase operation converts the text in the specified column to title case, where the first letter of each major word is capitalized. This is useful for formatting names, titles, and other text where capitalization is important.

note

The titlecase operation differs from proper case in that it capitalizes the first letter of each major word, while attempting to exclude certain common words like 'a', 'an', and 'the' from capitalization. This list of excluded words is based on the New York Times Manual of Style, with the addition of 'vs' and 'v'.

Apply titlecase on the procedure_name_en column:

qsv apply operations titlecase building_name_en -c building_name_en_titlecase DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_titlecase' \
| qsv sample 10 \
| qsv table
Output:
building_name_en                               building_name_en_titlecase
PALM BEACH TOWERS -1 Palm Beach Towers -1
WEST HEIGHTS 5 West Heights 5
HAMZA TOWER Hamza Tower
Burj Khalifa Burj Khalifa

CITY APARTMENTS City Apartments
Binghatti Corner Binghatti Corner

Trim

The trim operation removes any leading and trailing whitespaces from the string in the specified column. This is commonly used to clean up data where extra spaces might have been inadvertently added.

Apply trim on the procedure_name_en column:

qsv search -s building_name_en '^\s+|\s+$' DLD_Transactions_English_500.csv \
| qsv apply operations trim building_name_en -c building_name_en_trimmed \
| qsv select 'building_name_en,building_name_en_trimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en                  building_name_en_trimmed
LE GRAND CHATEAU LE GRAND CHATEAU
DM MESK DM MESK
Emirates Garden II - Magnolia 1 Emirates Garden II - Magnolia 1
DM MESK DM MESK
PALLADIUM PALLADIUM
tip

You can search for values in a column with leading or trailing spaces using the search command with the following regular expression.

qsv search -s building_name_en '^\s+|\s+$' DLD_Transactions_English_500.csv \
| qsv sample 10 \
| qsv table

Ltrim

The ltrim operation removes any leading whitespaces from the string in the specified column. This is useful for cleaning up text data where leading spaces are present.

Apply ltrim on the procedure_name_en column:

qsv search -s building_name_en '^\s+' DLD_Transactions_English.csv \
| qsv apply operations ltrim building_name_en -c building_name_en_ltrimmed \
| qsv select 'building_name_en,building_name_en_ltrimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en  building_name_en_ltrimmed
Starz Tower 1 Starz Tower 1
PALLADIUM PALLADIUM
PALLADIUM PALLADIUM
Starz Tower 1 Starz Tower 1
DM MESK DM MESK
LOFTS T EAST LOFTS T EAST
PALLADIUM PALLADIUM
Starz Tower 2 Starz Tower 2
PALLADIUM PALLADIUM
Starz Tower 1 Starz Tower 1
tip

You can search for values in a column with leading spaces using the search command with the following regular expression.

qsv search -s building_name_en '^\s+' DLD_Transactions_English_500.csv \
| qsv sample 10 \
| qsv table

Rtrim

The rtrim operation removes any trailing whitespaces from the string in the specified column. This is useful for cleaning up text data where trailing spaces are present.

Apply rtrim on the procedure_name_en column:

qsv search -s building_name_en '\s+$' DLD_Transactions_English.csv \
| qsv apply operations regex_replace building_name_en --comparand '\s+$' --replacement '[TRAILING_SPACE]' -c building_name_en_modified \
| qsv apply operations rtrim building_name_en -c building_name_en_rtrimmed \
| qsv select 'building_name_en,building_name_en_modified,building_name_en_rtrimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en                  building_name_en_modified                        building_name_en_rtrimmed
SHERENA RESIDENCE SHERENA RESIDENCE[TRAILING_SPACE] SHERENA RESIDENCE
J One Building J One Building[TRAILING_SPACE] J One Building
Studio One Studio One[TRAILING_SPACE] Studio One
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
Studio One Studio One[TRAILING_SPACE] Studio One
Studio One Studio One[TRAILING_SPACE] Studio One
Emirates Garden II - Magnolia 2 Emirates Garden II - Magnolia 2[TRAILING_SPACE] Emirates Garden II - Magnolia 2
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
tip

To detect and emphasize trailing spaces in a column, you can utilize the search command in conjunction with the apply command, using a regular expression that matches trailing whitespace.

qsv search -s building_name_en '\s+$' DLD_Transactions_English.csv \
| qsv apply operations regex_replace building_name_en --comparand '\s+$' --replacement '[TRAILING_SPACE]' -c building_name_en_modified \
| qsv select 'building_name_en,building_name_en_modified' \
| qsv sample 10 \
| qsv table

Mtrim

The mtrim operation removes all instances of the specified characters (provided by --comparand) from both the left and right sides of the string in the specified column. This operation is useful for cleaning up strings with unwanted characters at the start or end.

qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '(<{area_name_en}&>)' --new-column area_name_en_modified \
| qsv apply operations mtrim area_name_en_modified --comparand '()<>&' -c area_name_en_mtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mtrimmed' \
| qsv table
Output:
area_name_en                  area_name_en_modified              area_name_en_mtrimmed
Al Hebiah Fifth (<Al Hebiah Fifth&>) Al Hebiah Fifth
Al Hebiah Fourth (<Al Hebiah Fourth&>) Al Hebiah Fourth
Nad Shamma (<Nad Shamma&>) Nad Shamma
Dubai Investment Park Second (<Dubai Investment Park Second&>) Dubai Investment Park Second
Jabal Ali First (<Jabal Ali First&>) Jabal Ali First
Al Khairan First (<Al Khairan First&>) Al Khairan First
Al Thanyah Third (<Al Thanyah Third&>) Al Thanyah Third
Nad Al Hamar (<Nad Al Hamar&>) Nad Al Hamar
Palm Jumeirah (<Palm Jumeirah&>) Palm Jumeirah
Business Bay (<Business Bay&>) Business Bay
warning

The mtrim operation does not search for exact matches of the operand; instead, it removes all instances of the specified characters from both ends of the string.

Details

In this example, the first character on the left of area_name_en_mtrimmed is being removed because the mtrim operation is set to strip all instances of the characters [, P, A, D, I, N, G, and ] from both ends. This is why "Palm Jumeirah" becomes "alm Jumeirah" as the initial "P" is part of the characters to be removed.

qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mtrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mtrimmed' \
| qsv table
Output:
area_name_en            area_name_en_modified                     area_name_en_mtrimmed
Palm Jumeirah [PADDING]Palm Jumeirah[PADDING] alm Jumeirah
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] l Barsha South Fourth
Al Yufrah 1 [PADDING]Al Yufrah 1[PADDING] l Yufrah 1
Al Hebiah First [PADDING]Al Hebiah First[PADDING] l Hebiah First
Al Thanyah Third [PADDING]Al Thanyah Third[PADDING] l Thanyah Third
Al Barsha South Fifth [PADDING]Al Barsha South Fifth[PADDING] l Barsha South Fifth
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First
Nadd Hessa [PADDING]Nadd Hessa[PADDING] add Hessa
Business Bay [PADDING]Business Bay[PADDING] Business Bay
Al Hebiah Fourth [PADDING]Al Hebiah Fourth[PADDING] l Hebiah Fourth

Mltrim

The mltrim operation removes all instances of the specified characters (provided by --comparand) from the left side of the string in the specified column. This is useful when you want to clean up a specific prefix.

Apply mltrim on the area_name_en --comparand 'Al' column:

 qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mltrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mltrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mltrimmed' \
| qsv table
Output:
area_name_en            area_name_en_modified                     area_name_en_mltrimmed
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First[PADDING]
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] Al Barsha South Fourth[PADDING]
Me'Aisem First [PADDING]Me'Aisem First[PADDING] Me'Aisem First[PADDING]
Al Hebiah First [PADDING]Al Hebiah First[PADDING] Al Hebiah First[PADDING]
Al Safouh First [PADDING]Al Safouh First[PADDING] Al Safouh First[PADDING]
Palm Jumeirah [PADDING]Palm Jumeirah[PADDING] Palm Jumeirah[PADDING]
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First[PADDING]
Al Barshaa South Third [PADDING]Al Barshaa South Third[PADDING] Al Barshaa South Third[PADDING]
Marsa Dubai [PADDING]Marsa Dubai[PADDING] Marsa Dubai[PADDING]
Al Barshaa South Third [PADDING]Al Barshaa South Third[PADDING] Al Barshaa South Third[PADDING]

Mrtrim

The mrtrim operation removes all instances of the specified characters (provided by --comparand) from the right side of the string in the specified column. This is useful when you want to clean up a specific suffix.

Apply mrtrim on the area_name_en --comparand 'Dubai' column:

 qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mrtrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mrtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mrtrimmed' \
| qsv table
Output:
area_name_en                  area_name_en_modified                           area_name_en_mrtrimmed
Al Hebiah Fourth [PADDING]Al Hebiah Fourth[PADDING] [PADDING]Al Hebiah Fourth
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] [PADDING]Al Barsha South Fourth
Jabal Ali First [PADDING]Jabal Ali First[PADDING] [PADDING]Jabal Ali First
Al Manara [PADDING]Al Manara[PADDING] [PADDING]Al Manara
Al Suq Al Kabeer [PADDING]Al Suq Al Kabeer[PADDING] [PADDING]Al Suq Al Kabeer
Al Barsha South Fifth [PADDING]Al Barsha South Fifth[PADDING] [PADDING]Al Barsha South Fifth
Burj Khalifa [PADDING]Burj Khalifa[PADDING] [PADDING]Burj Khalifa
Ras Al Khor Industrial First [PADDING]Ras Al Khor Industrial First[PADDING] [PADDING]Ras Al Khor Industrial First
Al Yelayiss 2 [PADDING]Al Yelayiss 2[PADDING] [PADDING]Al Yelayiss 2
Madinat Hind 4 [PADDING]Madinat Hind 4[PADDING] [PADDING]Madinat Hind 4

Len

The len operation returns the length of the string in the specified column, measured in characters. This is useful for analyzing text data, such as determining the number of characters in a field.

Apply len on the property_type_en column:

qsv apply operations len building_name_en -c building_name_en_len DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_len' \
| qsv sample 10 \
| qsv table
Output:
building_name_en                   building_name_en_len
Azizi Riviera 12 16
PARK GATE RESIDENCES 4 22
ME DO RE 8
RED RESIDENCE 13
Palace Beach Residence Tower 2 30
REVA RESIDENCES 15
Address Residences Sky View 27
0
Vida Residences 3 & 4 - The Hills 33
AYKON CITY 3 - TOWER D 22

Squeeze

The squeeze operation compresses consecutive whitespaces in the specified column to a single space. This is useful for cleaning up text data with irregular spacing.

Apply squeeze on the building_name_en column:

qsv apply operations squeeze building_name_en -c building_name_en_squeezed DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_squeezed' \
| qsv search -s building_name_en '\s{2,}' \
| qsv sample 10 \
| qsv table
Output:
building_name_en   building_name_en_squeezed
BINGHATTI VIEWS BINGHATTI VIEWS
J & G BUILDING 2 J & G BUILDING 2
EMIRATES CROWN R EMIRATES CROWN R
PALLADIUM PALLADIUM

Squeeze0

The squeeze0 operation removes all whitespaces in the specified column. This is useful for data normalization tasks where you need to eliminate all spaces from text data.

Apply squeeze0 on the building_name_en column:

qsv apply operations squeeze0 building_name_en -c building_name_en_squeezed DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_squeezed' \
| qsv search -s building_name_en '\s{1,}' \
| qsv sample 10 \
| qsv table
Output:
building_name_en                             building_name_en_squeezed
DM Yass DMYass
CANDACE ACACIA HOTEL APARTMENTS CANDACEACACIAHOTELAPARTMENTS
Address Fountain Views Residences - Tower 2 AddressFountainViewsResidences-Tower2
DAMAC HILLS - GOLF VITA - A DAMACHILLS-GOLFVITA-A
AXIS RESIDENCES 6 AXISRESIDENCES6
DAMAC HILLS - GOLF VITA - A DAMACHILLS-GOLFVITA-A
ABBEY CRESCENT 1 ABBEYCRESCENT1
KEMPINSKI RESIDENCES KEMPINSKIRESIDENCES
Jumeirah Business Center 5 JumeirahBusinessCenter5
Hilliana Tower HillianaTower

Encoding Operations

Escape

The escape operation escapes all non-printable and special characters in the specified column. This is useful for preparing text data for storage or transmission in formats where certain characters have special meaning.

Apply escape on the area_name_en column:

qsv apply dynfmt --formatstr '{area_name_en}\n\t' --new-column area_name_en_modified DLD_Transactions_English_500.csv \
| qsv apply operations escape area_name_en_modified -c area_name_en_escaped \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_escaped' \
| qsv sample 10 \
| qsv table
Output:
area_name_en            area_name_en_modified       area_name_en_escaped
Al Warsan First Al Warsan First\n\t Al Warsan First\\n\\t
Al Thanyah First Al Thanyah First\n\t Al Thanyah First\\n\\t
Al Barsha South Fourth Al Barsha South Fourth\n\t Al Barsha South Fourth\\n\\t
Wadi Al Safa 5 Wadi Al Safa 5\n\t Wadi Al Safa 5\\n\\t
Burj Khalifa Burj Khalifa\n\t Burj Khalifa\\n\\t
Palm Jumeirah Palm Jumeirah\n\t Palm Jumeirah\\n\\t
Me'Aisem First Me'Aisem First\n\t Me\'Aisem First\\n\\t
Al Yelayiss 2 Al Yelayiss 2\n\t Al Yelayiss 2\\n\\t
Al Warsan First Al Warsan First\n\t Al Warsan First\\n\\t
Nadd Hessa Nadd Hessa\n\t Nadd Hessa\\n\\t

Encode64

The encode64 operation encodes the string in the specified column into Base64. This is useful for encoding binary data or for simple obfuscation.

Apply encode64 on the area_name_en column:

qsv apply operations encode64 area_name_en -c area_name_en_encoded64 DLD_Transactions_English_500.csv \
| qsv select 'area_name_en,area_name_en_encoded64' \
| qsv sample 10 \
| qsv table
Output:
area_name_en      area_name_en_encoded64
Marsa Dubai TWFyc2EgRHViYWk=
Al Barsha Second QWwgQmFyc2hhIFNlY29uZA==
Business Bay QnVzaW5lc3MgQmF5
Al Hebiah Third QWwgSGViaWFoIFRoaXJk
Business Bay QnVzaW5lc3MgQmF5
Al Yelayiss 2 QWwgWWVsYXlpc3MgMg==
Business Bay QnVzaW5lc3MgQmF5
Al Hebiah Fifth QWwgSGViaWFoIEZpZnRo
Mirdif TWlyZGlm
Burj Khalifa QnVyaiBLaGFsaWZh

Decode64

The decode64 operation decodes the Base64 encoded string in the specified column. This is useful for decoding data that has been encoded in Base64 format.

Apply decode64 on the area_name_en column:

qsv apply operations encode64 area_name_en -c area_name_en_encoded DLD_Transactions_English_500.csv \
| qsv apply operations decode64 area_name_en_encoded -c area_name_en_decoded \
| qsv select 'area_name_en,area_name_en_encoded,area_name_en_decoded' \
| qsv sample 10 \
| qsv table
Output:
area_name_en                       area_name_en_encoded                          area_name_en_decoded
Burj Khalifa QnVyaiBLaGFsaWZh Burj Khalifa
Al Khairan First QWwgS2hhaXJhbiBGaXJzdA== Al Khairan First
Burj Khalifa QnVyaiBLaGFsaWZh Burj Khalifa
Business Bay QnVzaW5lc3MgQmF5 Business Bay
Al Hebiah Fourth QWwgSGViaWFoIEZvdXJ0aA== Al Hebiah Fourth
Business Bay QnVzaW5lc3MgQmF5 Business Bay
Al Khairan First QWwgS2hhaXJhbiBGaXJzdA== Al Khairan First
Hadaeq Sheikh Mohammed Bin Rashid SGFkYWVxIFNoZWlraCBNb2hhbW1lZCBCaW4gUmFzaGlk Hadaeq Sheikh Mohammed Bin Rashid
Jabal Ali First SmFiYWwgQWxpIEZpcnN0 Jabal Ali First
Marsa Dubai TWFyc2EgRHViYWk= Marsa Dubai

Replacement Operations

Replace

The replace operation replaces all occurrences of a specified pattern (provided by --comparand) with a new string (provided by --replacement) in the specified column. This is useful for standardizing or cleaning text data.

Apply replace on the area_name_en --comparand 'Dubai' --replacement 'DXB' column:

qsv search -s area_name_en 'Dubai' DLD_Transactions_English_500.csv \
| qsv apply operations replace area_name_en --comparand 'Dubai' --replacement 'DXB' -c area_name_en_modified \
| qsv select 'area_name_en,area_name_en_modified' \
| qsv sample 10 \
| qsv table

Output:
area_name_en                  area_name_en_modified
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Madinat Dubai Almelaheyah Madinat DXB Almelaheyah
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Dubai Investment Park Second DXB Investment Park Second
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB

Regex_replace

The regex_replace operation replaces all occurrences of a regex pattern (provided by --comparand) with a new string (provided by --replacement) in the specified column. This is particularly useful for complex string transformations using regular expressions.

Apply regex_replace on the area_name_en --comparand 'D.*i' --replacement 'DXB' column:

building_name_en                             building_name_en_modified
Churchill Tower 1-Commercial Churchill Building X-Commercial
CREEK EDGE Tower 2 CREEK EDGE Building X
Act One Act Two Tower 1 Act One Act Two Building X
The St. Regis Residences - Tower 1 The St. Regis Residences - Building X
Palace Beach Residence Tower 2 Palace Beach Residence Building X
Socio Tower 1 Socio Building X
Socio Tower 1 Socio Building X
Address Fountain Views Residences - Tower 2 Address Fountain Views Residences - Building X
BURJ VISTA Tower 2 BURJ VISTA Building X
Harbour Gate Tower 2 Harbour Gate Building X
Output:
[Sample output data preview...]

Formatting Operations

Round

The round operation standardizes numerical data by rounding values in the specified column to a consistent precision, specified by --formatstr, using the Midpoint Nearest Even Rounding Strategy, also known as "Bankers Rounding." This is useful for standardizing numerical data to a consistent precision.

Apply round on the meter_sale_price --formatstr 2 column:

qsv apply operations round meter_sale_price --formatstr 1 -c meter_sale_price_rounded DLD_Transactions_English_500.csv \
| qsv select 'meter_sale_price,meter_sale_price_rounded' \
| qsv sample 10 \
| qsv table
Output:
meter_sale_price  meter_sale_price_rounded
12503.98 12504
26025.72 26025.7
7273.53 7273.5
13908.37 13908.4
12049.42 12049.4
7992.73 7992.7
1818.23 1818.2
6580.90 6580.9
12939.12 12939.1
10611.54 10611.5

Thousands

The thousands operation adds thousands separators to numeric values in the specified column. This is useful for enhancing readability of large numbers, with various separator policies available (e.g., comma, dot).

Apply thousands on the actual_worth --formatstr comma column:

The thousands operation converts numeric values to strings.

While this is useful for visualizing numbers in the terminal, it may lead to issues if the formatted data is saved and used in subsequent operations or systems that require numeric data types.

qsv apply operations thousands actual_worth --formatstr comma -c actual_worth_thousands DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_thousands' \
| qsv sample 10 \
| qsv table
Output:
actual_worth  actual_worth_thousands
771815.00 771,815
2500000.00 2,500,000
730000.00 730,000
500000.00 500,000
2071888.00 2,071,888
1244000.00 1,244,000
1500000.00 1,500,000
2440459.00 2,440,459
654458.00 654,458
2450000.00 2,450,000

Currencytonum

The currencytonum operation gets the numeric value of a currency. It supports currency symbols (e.g. $,¥,£,€,֏,₱,₽,₪,₩,ƒ,฿,₫) and strings (e.g. USD, EUR, RMB, JPY, etc.). It recognizes point, comma, and space separators.

Apply currencytonum on the actual_worth column:

qsv apply operations numtocurrency actual_worth -C $ -R 3.67 -c actual_worth_currency DLD_Transactions_English_500.csv \
| qsv apply operations currencytonum actual_worth_currency -c actual_worth_numeric \
| qsv select 'actual_worth,actual_worth_currency,actual_worth_numeric' \
| qsv sample 10 \
| qsv table
Output:
actual_worth  actual_worth_currency  actual_worth_numeric
4850000.00 $17,799,500.00 17799500.00
4000000.00 $14,680,000.00 14680000.00
1287200.00 $4,724,024.00 4724024.00
13190578.00 $48,409,421.26 48409421.26
2437500.00 $8,945,625.00 8945625.00
1000000.00 $3,670,000.00 3670000.00
9400000.00 $34,498,000.00 34498000.00
850000.00 $3,119,500.00 3119500.00
3850000.00 $14,129,500.00 14129500.00
10784800.00 $39,580,216.00 39580216.00

Numtocurrency

The numtocurrency operation converts a numeric value to a currency format. It supports various options to customize the output. Here's how to use it:

  • Specify the currency symbol using the --comparand option.
  • The operation automatically rounds values to two decimal places.
  • To use "euro" formatting (e.g., 1.000,00 instead of 1,000.00), set the --formatstr option to "euro".
  • To apply a conversion rate, set the --replacement option to the desired rate.

To apply numtocurrency on the actual_worth column with a conversion rate of 3.67 and a currency symbol of $, use the following command:

qsv apply operations numtocurrency actual_worth --comparand $ --replacement 3.67 -c actual_worth_currency DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency' \
| qsv sample 10 \
| qsv table

Output:
actual_worth  actual_worth_currency
4061887.00 $14,907,125.29
3000000.00 $11,010,000.00
2475000.00 $9,083,250.00
2800000.00 $10,276,000.00
2850000.00 $10,459,500.00
2000000.00 $7,340,000.00
580000.00 $2,128,600.00
3208000.00 $11,773,360.00
456600.00 $1,675,722.00
7019840.00 $25,762,812.80
Additional Samples:

Example 1: Converting to Euros

qsv apply operations numtocurrency actual_worth --comparand--replacement 3.67 -c actual_worth_currency_euro DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_euro' \
| qsv sample 10 \
| qsv table

This example demonstrates how to convert the actual_worth column to Euros using the numtocurrency operation. The --comparand option is set to to specify the Euro symbol, and the --replacement option is set to 3.67 to apply the conversion rate. The resulting output is in the format €X,XXX,XXX.XX.

actual_worth  actual_worth_currency_euro
1000000.00 €3.670.000,00
1500000.00 €5.505.000,00
2500000.00 €9.175.000,00
3000000.00 €11.010.000,00
4000000.00 €14.680.000,00
5000000.00 €18.350.000,00
6000000.00 €22.020.000,00
7000000.00 €25.690.000,00
8000000.00 €29.360.000,00
9000000.00 €33.030.000,00

Example 2: Converting to Japanese Yen

qsv apply operations numtocurrency actual_worth --comparand ¥ --replacement 3.67 -c actual_worth_currency_yen DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_yen' \
| qsv sample 10 \
| qsv table

This example demonstrates how to convert the actual_worth column to Japanese Yen using the numtocurrency operation. The --comparand option is set to ¥ to specify the Yen symbol, and the --replacement option is set to 3.67 to apply the conversion rate. The resulting output is in the format ¥X,XXX,XXX.

actual_worth  actual_worth_currency_yen
1000000.00 ¥3,670,000
1500000.00 ¥5,505,000
2500000.00 ¥9,175,000
3000000.00 ¥11,010,000
4000000.00 ¥14,680,000
5000000.00 ¥18,350,000
6000000.00 ¥22,020,000
7000000.00 ¥25,690,000
8000000.00 ¥29,360,000
9000000.00 ¥33,030,000

Example 3: Converting to US Dollars with Euro Formatting

qsv apply operations numtocurrency actual_worth --comparand $ --replacement 3.67 --formatstr euro -c actual_worth_currency_usd_euro DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_usd_euro' \
| qsv sample 10 \
| qsv table

This example demonstrates how to convert the actual_worth column to US Dollars using the numtocurrency operation, but with Euro formatting. The --comparand option is set to $ to specify the US Dollar symbol, the --replacement option is set to 3.67 to apply the conversion rate, and the --formatstr option is set to euro to use Euro formatting. The resulting output is in the format $X.XXX.XXX,XX.

actual_worth  actual_worth_currency_usd_euro
1000000.00 $3.670.000,00
1500000.00 $5.505.000,00
2500000.00 $9.175.000,00
3000000.00 $11.010.000,00
4000000.00 $14.680.000,00
5000000.00 $18.350.000,00
6000000.00 $22.020.000,00
7000000.00 $25.690.000,00
8000000.00 $29.360.000,00
9000000.00 $33.030.000,00

Copy Operations

Copy

The copy operation duplicates the values in the specified column to a new column. This is useful for preserving the original data while applying further transformations to the copied column.

Apply copy on the area_name_en column:

qsv apply operations copy area_name_en -c area_name_copy DLD_Transactions_English_500.csv \
| qsv select 'area_name_en,area_name_copy' \
| qsv sample 10 \
| qsv table

Output:
area_name_en             area_name_copy
Al Hebiah First Al Hebiah First
Marsa Dubai Marsa Dubai
Jabal Ali First Jabal Ali First
Business Bay Business Bay
Al Barsha South Fourth Al Barsha South Fourth
Al Barshaa South Second Al Barshaa South Second
Business Bay Business Bay
Jabal Ali First Jabal Ali First
Burj Khalifa Burj Khalifa
Al Hebiah Fourth Al Hebiah Fourth

Common Use Cases

  • Standardizing text data (e.g., converting to lowercase, uppercase, or title case)
  • Cleaning and formatting strings (e.g., trimming whitespace, squeezing multiple spaces)
  • Encoding and decoding data (e.g., Base64 encoding/decoding)
  • Performing string replacements and regex-based transformations
  • Formatting numerical data (e.g., rounding, adding thousands separators)
  • Converting between numeric and currency formats
  • Creating copies of columns for further manipulation

Tips

  • Use the -c, --new-column <name> option to preserve the original data while creating transformed columns
  • Combine multiple apply operations for complex transformations
  • When working with currency data, be mindful of the formatting options and conversion rates
  • Use regex_replace for more complex string manipulations
  • Consider the impact of transformations on data types, especially when formatting numbers as strings

See Also

  • select - for selecting specific columns after transformation
  • search - for finding specific patterns in data before or after transformation
  • sample - for working with a subset of data when testing transformations
  • table - for displaying transformed data in a readable format