apply
The apply
command applies a series of string, format, currency, regex, and NLP transformations to given CSV columns.
Syntax
qsv apply operations [options] <column> <operation> [<input_file>] [<output_file>]
Description
The apply
command is a powerful tool for data transformation. It can perform various operations on specified columns, including string manipulations, format conversions, numerical calculations, and natural language processing tasks.
Options
-c, --new-column <name>
: Put the transformed values in a new column instead-r, --rename <name>
: New name for the transformed column-C, --comparand=<string>
: The string to compare against for replace and similarity operations-R, --replacement=<string>
: The string to use for replace operations-f, --formatstr=<string>
: This option is used by several subcommands
String Operations
Common Combinations
qsv apply operations lower trans_group_en -c trans_group_en_lower DLD_Transactions_English_500.csv \
| qsv apply operations upper project_name_en -c project_name_en_upper \
| qsv select 'trans_group_en_lower,project_name_en_upper' \
| qsv sample 10 \
| qsv table
Lower
The lower
operation converts all characters in the specified column to lowercase. This is helpful for standardizing text data, especially when case sensitivity is not important.
Apply lower
on the trans_group_en
column:
qsv apply operations lower trans_group_en -c trans_group_en_lower DLD_Transactions_English_500.csv \
| qsv select 'trans_group_en,trans_group_en_lower' \
| qsv sample 10 \
| qsv table
Output:
trans_group_en trans_group_en_lower
Sales sales
Sales sales
Sales sales
Sales sales
Sales sales
Mortgages mortgages
Mortgages mortgages
Mortgages mortgages
Sales sales
Mortgages mortgages
Upper
The upper
operation converts all characters in the specified column to uppercase. This is useful for ensuring consistency in text data, such as converting names or codes to a uniform format.
Apply upper
on the area_name_en
column:
qsv apply operations upper trans_group_en -c trans_group_en_upper DLD_Transactions_English_500.csv \
| qsv select 'trans_group_en,trans_group_en_upper' \
| qsv sample 10 \
| qsv table
Output:
trans_group_en trans_group_en_upper
Sales SALES
Sales SALES
Sales SALES
Mortgages MORTGAGES
Mortgages MORTGAGES
Sales SALES
Sales SALES
Sales SALES
Sales SALES
Sales SALES
Titlecase
The titlecase
operation converts the text in the specified column to title case, where the first letter of each major word is capitalized. This is useful for formatting names, titles, and other text where capitalization is important.
The titlecase
operation differs from proper case in that it capitalizes the first letter of each major word, while attempting to exclude certain common words like 'a', 'an', and 'the' from capitalization. This list of excluded words is based on the New York Times Manual of Style, with the addition of 'vs' and 'v'.
Apply titlecase
on the procedure_name_en
column:
qsv apply operations titlecase building_name_en -c building_name_en_titlecase DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_titlecase' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_titlecase
PALM BEACH TOWERS -1 Palm Beach Towers -1
WEST HEIGHTS 5 West Heights 5
HAMZA TOWER Hamza Tower
Burj Khalifa Burj Khalifa
CITY APARTMENTS City Apartments
Binghatti Corner Binghatti Corner
Trim
The trim
operation removes any leading and trailing whitespaces from the string in the specified column. This is commonly used to clean up data where extra spaces might have been inadvertently added.
Apply trim
on the procedure_name_en
column:
qsv search -s building_name_en '^\s+|\s+$' DLD_Transactions_English_500.csv \
| qsv apply operations trim building_name_en -c building_name_en_trimmed \
| qsv select 'building_name_en,building_name_en_trimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_trimmed
LE GRAND CHATEAU LE GRAND CHATEAU
DM MESK DM MESK
Emirates Garden II - Magnolia 1 Emirates Garden II - Magnolia 1
DM MESK DM MESK
PALLADIUM PALLADIUM
You can search for values in a column with leading or trailing spaces using the search
command with the following regular expression.
qsv search -s building_name_en '^\s+|\s+$' DLD_Transactions_English_500.csv \
| qsv sample 10 \
| qsv table
Ltrim
The ltrim
operation removes any leading whitespaces from the string in the specified column. This is useful for cleaning up text data where leading spaces are present.
Apply ltrim
on the procedure_name_en
column:
qsv search -s building_name_en '^\s+' DLD_Transactions_English.csv \
| qsv apply operations ltrim building_name_en -c building_name_en_ltrimmed \
| qsv select 'building_name_en,building_name_en_ltrimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_ltrimmed
Starz Tower 1 Starz Tower 1
PALLADIUM PALLADIUM
PALLADIUM PALLADIUM
Starz Tower 1 Starz Tower 1
DM MESK DM MESK
LOFTS T EAST LOFTS T EAST
PALLADIUM PALLADIUM
Starz Tower 2 Starz Tower 2
PALLADIUM PALLADIUM
Starz Tower 1 Starz Tower 1
You can search for values in a column with leading spaces using the search
command with the following regular expression.
qsv search -s building_name_en '^\s+' DLD_Transactions_English_500.csv \
| qsv sample 10 \
| qsv table
Rtrim
The rtrim
operation removes any trailing whitespaces from the string in the specified column. This is useful for cleaning up text data where trailing spaces are present.
Apply rtrim
on the procedure_name_en
column:
qsv search -s building_name_en '\s+$' DLD_Transactions_English.csv \
| qsv apply operations regex_replace building_name_en --comparand '\s+$' --replacement '[TRAILING_SPACE]' -c building_name_en_modified \
| qsv apply operations rtrim building_name_en -c building_name_en_rtrimmed \
| qsv select 'building_name_en,building_name_en_modified,building_name_en_rtrimmed' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_modified building_name_en_rtrimmed
SHERENA RESIDENCE SHERENA RESIDENCE[TRAILING_SPACE] SHERENA RESIDENCE
J One Building J One Building[TRAILING_SPACE] J One Building
Studio One Studio One[TRAILING_SPACE] Studio One
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
Studio One Studio One[TRAILING_SPACE] Studio One
Studio One Studio One[TRAILING_SPACE] Studio One
Emirates Garden II - Magnolia 2 Emirates Garden II - Magnolia 2[TRAILING_SPACE] Emirates Garden II - Magnolia 2
PULSE SMART RESIDENCE PULSE SMART RESIDENCE[TRAILING_SPACE] PULSE SMART RESIDENCE
To detect and emphasize trailing spaces in a column, you can utilize the search
command in conjunction with the apply
command, using a regular expression that matches trailing whitespace.
qsv search -s building_name_en '\s+$' DLD_Transactions_English.csv \
| qsv apply operations regex_replace building_name_en --comparand '\s+$' --replacement '[TRAILING_SPACE]' -c building_name_en_modified \
| qsv select 'building_name_en,building_name_en_modified' \
| qsv sample 10 \
| qsv table
Mtrim
The mtrim
operation removes all instances of the specified characters (provided by --comparand
) from both the left and right sides of the string in the specified column. This operation is useful for cleaning up strings with unwanted characters at the start or end.
qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '(<{area_name_en}&>)' --new-column area_name_en_modified \
| qsv apply operations mtrim area_name_en_modified --comparand '()<>&' -c area_name_en_mtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mtrimmed' \
| qsv table
Output:
area_name_en area_name_en_modified area_name_en_mtrimmed
Al Hebiah Fifth (<Al Hebiah Fifth&>) Al Hebiah Fifth
Al Hebiah Fourth (<Al Hebiah Fourth&>) Al Hebiah Fourth
Nad Shamma (<Nad Shamma&>) Nad Shamma
Dubai Investment Park Second (<Dubai Investment Park Second&>) Dubai Investment Park Second
Jabal Ali First (<Jabal Ali First&>) Jabal Ali First
Al Khairan First (<Al Khairan First&>) Al Khairan First
Al Thanyah Third (<Al Thanyah Third&>) Al Thanyah Third
Nad Al Hamar (<Nad Al Hamar&>) Nad Al Hamar
Palm Jumeirah (<Palm Jumeirah&>) Palm Jumeirah
Business Bay (<Business Bay&>) Business Bay
The mtrim
operation does not search for exact matches of the operand; instead, it removes all instances of the specified characters from both ends of the string.
Details
In this example, the first character on the left of area_name_en_mtrimmed
is being removed because the mtrim
operation is set to strip all instances of the characters [, P, A, D, I, N, G, and ] from both ends. This is why "Palm Jumeirah" becomes "alm Jumeirah" as the initial "P" is part of the characters to be removed.
qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mtrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mtrimmed' \
| qsv table
Output:
area_name_en area_name_en_modified area_name_en_mtrimmed
Palm Jumeirah [PADDING]Palm Jumeirah[PADDING] alm Jumeirah
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] l Barsha South Fourth
Al Yufrah 1 [PADDING]Al Yufrah 1[PADDING] l Yufrah 1
Al Hebiah First [PADDING]Al Hebiah First[PADDING] l Hebiah First
Al Thanyah Third [PADDING]Al Thanyah Third[PADDING] l Thanyah Third
Al Barsha South Fifth [PADDING]Al Barsha South Fifth[PADDING] l Barsha South Fifth
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First
Nadd Hessa [PADDING]Nadd Hessa[PADDING] add Hessa
Business Bay [PADDING]Business Bay[PADDING] Business Bay
Al Hebiah Fourth [PADDING]Al Hebiah Fourth[PADDING] l Hebiah Fourth
Mltrim
The mltrim
operation removes all instances of the specified characters (provided by --comparand
) from the left side of the string in the specified column. This is useful when you want to clean up a specific prefix.
Apply mltrim
on the area_name_en --comparand 'Al'
column:
qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mltrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mltrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mltrimmed' \
| qsv table
Output:
area_name_en area_name_en_modified area_name_en_mltrimmed
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First[PADDING]
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] Al Barsha South Fourth[PADDING]
Me'Aisem First [PADDING]Me'Aisem First[PADDING] Me'Aisem First[PADDING]
Al Hebiah First [PADDING]Al Hebiah First[PADDING] Al Hebiah First[PADDING]
Al Safouh First [PADDING]Al Safouh First[PADDING] Al Safouh First[PADDING]
Palm Jumeirah [PADDING]Palm Jumeirah[PADDING] Palm Jumeirah[PADDING]
Jabal Ali First [PADDING]Jabal Ali First[PADDING] Jabal Ali First[PADDING]
Al Barshaa South Third [PADDING]Al Barshaa South Third[PADDING] Al Barshaa South Third[PADDING]
Marsa Dubai [PADDING]Marsa Dubai[PADDING] Marsa Dubai[PADDING]
Al Barshaa South Third [PADDING]Al Barshaa South Third[PADDING] Al Barshaa South Third[PADDING]
Mrtrim
The mrtrim
operation removes all instances of the specified characters (provided by --comparand
) from the right side of the string in the specified column. This is useful when you want to clean up a specific suffix.
Apply mrtrim
on the area_name_en --comparand 'Dubai'
column:
qsv sample 10 DLD_Transactions_English_500.csv \
| qsv apply dynfmt --formatstr '[PADDING]{area_name_en}[PADDING]' --new-column area_name_en_modified \
| qsv apply operations mrtrim area_name_en_modified --comparand '[PADDING]' -c area_name_en_mrtrimmed \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_mrtrimmed' \
| qsv table
Output:
area_name_en area_name_en_modified area_name_en_mrtrimmed
Al Hebiah Fourth [PADDING]Al Hebiah Fourth[PADDING] [PADDING]Al Hebiah Fourth
Al Barsha South Fourth [PADDING]Al Barsha South Fourth[PADDING] [PADDING]Al Barsha South Fourth
Jabal Ali First [PADDING]Jabal Ali First[PADDING] [PADDING]Jabal Ali First
Al Manara [PADDING]Al Manara[PADDING] [PADDING]Al Manara
Al Suq Al Kabeer [PADDING]Al Suq Al Kabeer[PADDING] [PADDING]Al Suq Al Kabeer
Al Barsha South Fifth [PADDING]Al Barsha South Fifth[PADDING] [PADDING]Al Barsha South Fifth
Burj Khalifa [PADDING]Burj Khalifa[PADDING] [PADDING]Burj Khalifa
Ras Al Khor Industrial First [PADDING]Ras Al Khor Industrial First[PADDING] [PADDING]Ras Al Khor Industrial First
Al Yelayiss 2 [PADDING]Al Yelayiss 2[PADDING] [PADDING]Al Yelayiss 2
Madinat Hind 4 [PADDING]Madinat Hind 4[PADDING] [PADDING]Madinat Hind 4
Len
The len
operation returns the length of the string in the specified column, measured in characters. This is useful for analyzing text data, such as determining the number of characters in a field.
Apply len
on the property_type_en
column:
qsv apply operations len building_name_en -c building_name_en_len DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_len' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_len
Azizi Riviera 12 16
PARK GATE RESIDENCES 4 22
ME DO RE 8
RED RESIDENCE 13
Palace Beach Residence Tower 2 30
REVA RESIDENCES 15
Address Residences Sky View 27
0
Vida Residences 3 & 4 - The Hills 33
AYKON CITY 3 - TOWER D 22
Squeeze
The squeeze
operation compresses consecutive whitespaces in the specified column to a single space. This is useful for cleaning up text data with irregular spacing.
Apply squeeze
on the building_name_en
column:
qsv apply operations squeeze building_name_en -c building_name_en_squeezed DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_squeezed' \
| qsv search -s building_name_en '\s{2,}' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_squeezed
BINGHATTI VIEWS BINGHATTI VIEWS
J & G BUILDING 2 J & G BUILDING 2
EMIRATES CROWN R EMIRATES CROWN R
PALLADIUM PALLADIUM
Squeeze0
The squeeze0
operation removes all whitespaces in the specified column. This is useful for data normalization tasks where you need to eliminate all spaces from text data.
Apply squeeze0
on the building_name_en
column:
qsv apply operations squeeze0 building_name_en -c building_name_en_squeezed DLD_Transactions_English_500.csv \
| qsv select 'building_name_en,building_name_en_squeezed' \
| qsv search -s building_name_en '\s{1,}' \
| qsv sample 10 \
| qsv table
Output:
building_name_en building_name_en_squeezed
DM Yass DMYass
CANDACE ACACIA HOTEL APARTMENTS CANDACEACACIAHOTELAPARTMENTS
Address Fountain Views Residences - Tower 2 AddressFountainViewsResidences-Tower2
DAMAC HILLS - GOLF VITA - A DAMACHILLS-GOLFVITA-A
AXIS RESIDENCES 6 AXISRESIDENCES6
DAMAC HILLS - GOLF VITA - A DAMACHILLS-GOLFVITA-A
ABBEY CRESCENT 1 ABBEYCRESCENT1
KEMPINSKI RESIDENCES KEMPINSKIRESIDENCES
Jumeirah Business Center 5 JumeirahBusinessCenter5
Hilliana Tower HillianaTower
Encoding Operations
Escape
The escape
operation escapes all non-printable and special characters in the specified column. This is useful for preparing text data for storage or transmission in formats where certain characters have special meaning.
Apply escape
on the area_name_en
column:
qsv apply dynfmt --formatstr '{area_name_en}\n\t' --new-column area_name_en_modified DLD_Transactions_English_500.csv \
| qsv apply operations escape area_name_en_modified -c area_name_en_escaped \
| qsv select 'area_name_en,area_name_en_modified,area_name_en_escaped' \
| qsv sample 10 \
| qsv table
Output:
area_name_en area_name_en_modified area_name_en_escaped
Al Warsan First Al Warsan First\n\t Al Warsan First\\n\\t
Al Thanyah First Al Thanyah First\n\t Al Thanyah First\\n\\t
Al Barsha South Fourth Al Barsha South Fourth\n\t Al Barsha South Fourth\\n\\t
Wadi Al Safa 5 Wadi Al Safa 5\n\t Wadi Al Safa 5\\n\\t
Burj Khalifa Burj Khalifa\n\t Burj Khalifa\\n\\t
Palm Jumeirah Palm Jumeirah\n\t Palm Jumeirah\\n\\t
Me'Aisem First Me'Aisem First\n\t Me\'Aisem First\\n\\t
Al Yelayiss 2 Al Yelayiss 2\n\t Al Yelayiss 2\\n\\t
Al Warsan First Al Warsan First\n\t Al Warsan First\\n\\t
Nadd Hessa Nadd Hessa\n\t Nadd Hessa\\n\\t
Encode64
The encode64
operation encodes the string in the specified column into Base64. This is useful for encoding binary data or for simple obfuscation.
Apply encode64
on the area_name_en
column:
qsv apply operations encode64 area_name_en -c area_name_en_encoded64 DLD_Transactions_English_500.csv \
| qsv select 'area_name_en,area_name_en_encoded64' \
| qsv sample 10 \
| qsv table
Output:
area_name_en area_name_en_encoded64
Marsa Dubai TWFyc2EgRHViYWk=
Al Barsha Second QWwgQmFyc2hhIFNlY29uZA==
Business Bay QnVzaW5lc3MgQmF5
Al Hebiah Third QWwgSGViaWFoIFRoaXJk
Business Bay QnVzaW5lc3MgQmF5
Al Yelayiss 2 QWwgWWVsYXlpc3MgMg==
Business Bay QnVzaW5lc3MgQmF5
Al Hebiah Fifth QWwgSGViaWFoIEZpZnRo
Mirdif TWlyZGlm
Burj Khalifa QnVyaiBLaGFsaWZh
Decode64
The decode64
operation decodes the Base64 encoded string in the specified column. This is useful for decoding data that has been encoded in Base64 format.
Apply decode64
on the area_name_en
column:
qsv apply operations encode64 area_name_en -c area_name_en_encoded DLD_Transactions_English_500.csv \
| qsv apply operations decode64 area_name_en_encoded -c area_name_en_decoded \
| qsv select 'area_name_en,area_name_en_encoded,area_name_en_decoded' \
| qsv sample 10 \
| qsv table
Output:
area_name_en area_name_en_encoded area_name_en_decoded
Burj Khalifa QnVyaiBLaGFsaWZh Burj Khalifa
Al Khairan First QWwgS2hhaXJhbiBGaXJzdA== Al Khairan First
Burj Khalifa QnVyaiBLaGFsaWZh Burj Khalifa
Business Bay QnVzaW5lc3MgQmF5 Business Bay
Al Hebiah Fourth QWwgSGViaWFoIEZvdXJ0aA== Al Hebiah Fourth
Business Bay QnVzaW5lc3MgQmF5 Business Bay
Al Khairan First QWwgS2hhaXJhbiBGaXJzdA== Al Khairan First
Hadaeq Sheikh Mohammed Bin Rashid SGFkYWVxIFNoZWlraCBNb2hhbW1lZCBCaW4gUmFzaGlk Hadaeq Sheikh Mohammed Bin Rashid
Jabal Ali First SmFiYWwgQWxpIEZpcnN0 Jabal Ali First
Marsa Dubai TWFyc2EgRHViYWk= Marsa Dubai
Replacement Operations
Replace
The replace
operation replaces all occurrences of a specified pattern (provided by --comparand
) with a new string (provided by --replacement
) in the specified column. This is useful for standardizing or cleaning text data.
Apply replace
on the area_name_en --comparand 'Dubai' --replacement 'DXB'
column:
qsv search -s area_name_en 'Dubai' DLD_Transactions_English_500.csv \
| qsv apply operations replace area_name_en --comparand 'Dubai' --replacement 'DXB' -c area_name_en_modified \
| qsv select 'area_name_en,area_name_en_modified' \
| qsv sample 10 \
| qsv table
Output:
area_name_en area_name_en_modified
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Madinat Dubai Almelaheyah Madinat DXB Almelaheyah
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Dubai Investment Park Second DXB Investment Park Second
Marsa Dubai Marsa DXB
Marsa Dubai Marsa DXB
Regex_replace
The regex_replace
operation replaces all occurrences of a regex pattern (provided by --comparand
) with a new string (provided by --replacement
) in the specified column. This is particularly useful for complex string transformations using regular expressions.
Apply regex_replace
on the area_name_en --comparand 'D.*i' --replacement 'DXB'
column:
building_name_en building_name_en_modified
Churchill Tower 1-Commercial Churchill Building X-Commercial
CREEK EDGE Tower 2 CREEK EDGE Building X
Act One Act Two Tower 1 Act One Act Two Building X
The St. Regis Residences - Tower 1 The St. Regis Residences - Building X
Palace Beach Residence Tower 2 Palace Beach Residence Building X
Socio Tower 1 Socio Building X
Socio Tower 1 Socio Building X
Address Fountain Views Residences - Tower 2 Address Fountain Views Residences - Building X
BURJ VISTA Tower 2 BURJ VISTA Building X
Harbour Gate Tower 2 Harbour Gate Building X
Output:
[Sample output data preview...]
Formatting Operations
Round
The round
operation standardizes numerical data by rounding values in the specified column to a consistent precision, specified by --formatstr
, using the Midpoint Nearest Even Rounding Strategy, also known as "Bankers Rounding." This is useful for standardizing numerical data to a consistent precision.
Apply round
on the meter_sale_price --formatstr 2
column:
qsv apply operations round meter_sale_price --formatstr 1 -c meter_sale_price_rounded DLD_Transactions_English_500.csv \
| qsv select 'meter_sale_price,meter_sale_price_rounded' \
| qsv sample 10 \
| qsv table
Output:
meter_sale_price meter_sale_price_rounded
12503.98 12504
26025.72 26025.7
7273.53 7273.5
13908.37 13908.4
12049.42 12049.4
7992.73 7992.7
1818.23 1818.2
6580.90 6580.9
12939.12 12939.1
10611.54 10611.5
Thousands
The thousands
operation adds thousands separators to numeric values in the specified column. This is useful for enhancing readability of large numbers, with various separator policies available (e.g., comma, dot).
Apply thousands
on the actual_worth --formatstr comma
column:
thousands
operation converts numeric values to strings.While this is useful for visualizing numbers in the terminal, it may lead to issues if the formatted data is saved and used in subsequent operations or systems that require numeric data types.
qsv apply operations thousands actual_worth --formatstr comma -c actual_worth_thousands DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_thousands' \
| qsv sample 10 \
| qsv table
Output:
actual_worth actual_worth_thousands
771815.00 771,815
2500000.00 2,500,000
730000.00 730,000
500000.00 500,000
2071888.00 2,071,888
1244000.00 1,244,000
1500000.00 1,500,000
2440459.00 2,440,459
654458.00 654,458
2450000.00 2,450,000
Currencytonum
The currencytonum
operation gets the numeric value of a currency. It supports currency symbols (e.g. $,¥,£,€,֏,₱,₽,₪,₩,ƒ,฿,₫) and strings (e.g. USD, EUR, RMB, JPY, etc.). It recognizes point, comma, and space separators.
Apply currencytonum
on the actual_worth
column:
qsv apply operations numtocurrency actual_worth -C $ -R 3.67 -c actual_worth_currency DLD_Transactions_English_500.csv \
| qsv apply operations currencytonum actual_worth_currency -c actual_worth_numeric \
| qsv select 'actual_worth,actual_worth_currency,actual_worth_numeric' \
| qsv sample 10 \
| qsv table
Output:
actual_worth actual_worth_currency actual_worth_numeric
4850000.00 $17,799,500.00 17799500.00
4000000.00 $14,680,000.00 14680000.00
1287200.00 $4,724,024.00 4724024.00
13190578.00 $48,409,421.26 48409421.26
2437500.00 $8,945,625.00 8945625.00
1000000.00 $3,670,000.00 3670000.00
9400000.00 $34,498,000.00 34498000.00
850000.00 $3,119,500.00 3119500.00
3850000.00 $14,129,500.00 14129500.00
10784800.00 $39,580,216.00 39580216.00
Numtocurrency
The numtocurrency
operation converts a numeric value to a currency format. It supports various options to customize the output. Here's how to use it:
- Specify the currency symbol using the
--comparand
option. - The operation automatically rounds values to two decimal places.
- To use "euro" formatting (e.g., 1.000,00 instead of 1,000.00), set the
--formatstr
option to "euro". - To apply a conversion rate, set the
--replacement
option to the desired rate.
To apply numtocurrency
on the actual_worth
column with a conversion rate of 3.67 and a currency symbol of $
, use the following command:
qsv apply operations numtocurrency actual_worth --comparand $ --replacement 3.67 -c actual_worth_currency DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency' \
| qsv sample 10 \
| qsv table
Output:
actual_worth actual_worth_currency
4061887.00 $14,907,125.29
3000000.00 $11,010,000.00
2475000.00 $9,083,250.00
2800000.00 $10,276,000.00
2850000.00 $10,459,500.00
2000000.00 $7,340,000.00
580000.00 $2,128,600.00
3208000.00 $11,773,360.00
456600.00 $1,675,722.00
7019840.00 $25,762,812.80
Additional Samples:
Example 1: Converting to Euros
qsv apply operations numtocurrency actual_worth --comparand € --replacement 3.67 -c actual_worth_currency_euro DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_euro' \
| qsv sample 10 \
| qsv table
This example demonstrates how to convert the actual_worth
column to Euros using the numtocurrency
operation. The --comparand
option is set to €
to specify the Euro symbol, and the --replacement
option is set to 3.67
to apply the conversion rate. The resulting output is in the format €X,XXX,XXX.XX
.
actual_worth actual_worth_currency_euro
1000000.00 €3.670.000,00
1500000.00 €5.505.000,00
2500000.00 €9.175.000,00
3000000.00 €11.010.000,00
4000000.00 €14.680.000,00
5000000.00 €18.350.000,00
6000000.00 €22.020.000,00
7000000.00 €25.690.000,00
8000000.00 €29.360.000,00
9000000.00 €33.030.000,00
Example 2: Converting to Japanese Yen
qsv apply operations numtocurrency actual_worth --comparand ¥ --replacement 3.67 -c actual_worth_currency_yen DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_yen' \
| qsv sample 10 \
| qsv table
This example demonstrates how to convert the actual_worth
column to Japanese Yen using the numtocurrency
operation. The --comparand
option is set to ¥
to specify the Yen symbol, and the --replacement
option is set to 3.67
to apply the conversion rate. The resulting output is in the format ¥X,XXX,XXX
.
actual_worth actual_worth_currency_yen
1000000.00 ¥3,670,000
1500000.00 ¥5,505,000
2500000.00 ¥9,175,000
3000000.00 ¥11,010,000
4000000.00 ¥14,680,000
5000000.00 ¥18,350,000
6000000.00 ¥22,020,000
7000000.00 ¥25,690,000
8000000.00 ¥29,360,000
9000000.00 ¥33,030,000
Example 3: Converting to US Dollars with Euro Formatting
qsv apply operations numtocurrency actual_worth --comparand $ --replacement 3.67 --formatstr euro -c actual_worth_currency_usd_euro DLD_Transactions_English_500.csv \
| qsv select 'actual_worth,actual_worth_currency_usd_euro' \
| qsv sample 10 \
| qsv table
This example demonstrates how to convert the actual_worth
column to US Dollars using the numtocurrency
operation, but with Euro formatting. The --comparand
option is set to $
to specify the US Dollar symbol, the --replacement
option is set to 3.67
to apply the conversion rate, and the --formatstr
option is set to euro
to use Euro formatting. The resulting output is in the format $X.XXX.XXX,XX
.
actual_worth actual_worth_currency_usd_euro
1000000.00 $3.670.000,00
1500000.00 $5.505.000,00
2500000.00 $9.175.000,00
3000000.00 $11.010.000,00
4000000.00 $14.680.000,00
5000000.00 $18.350.000,00
6000000.00 $22.020.000,00
7000000.00 $25.690.000,00
8000000.00 $29.360.000,00
9000000.00 $33.030.000,00
Copy Operations
Copy
The copy
operation duplicates the values in the specified column to a new column. This is useful for preserving the original data while applying further transformations to the copied column.
Apply copy
on the area_name_en
column:
qsv apply operations copy area_name_en -c area_name_copy DLD_Transactions_English_500.csv \
| qsv select 'area_name_en,area_name_copy' \
| qsv sample 10 \
| qsv table
Output:
area_name_en area_name_copy
Al Hebiah First Al Hebiah First
Marsa Dubai Marsa Dubai
Jabal Ali First Jabal Ali First
Business Bay Business Bay
Al Barsha South Fourth Al Barsha South Fourth
Al Barshaa South Second Al Barshaa South Second
Business Bay Business Bay
Jabal Ali First Jabal Ali First
Burj Khalifa Burj Khalifa
Al Hebiah Fourth Al Hebiah Fourth
Common Use Cases
- Standardizing text data (e.g., converting to lowercase, uppercase, or title case)
- Cleaning and formatting strings (e.g., trimming whitespace, squeezing multiple spaces)
- Encoding and decoding data (e.g., Base64 encoding/decoding)
- Performing string replacements and regex-based transformations
- Formatting numerical data (e.g., rounding, adding thousands separators)
- Converting between numeric and currency formats
- Creating copies of columns for further manipulation
Tips
- Use the
-c, --new-column <name>
option to preserve the original data while creating transformed columns - Combine multiple
apply
operations for complex transformations - When working with currency data, be mindful of the formatting options and conversion rates
- Use regex_replace for more complex string manipulations
- Consider the impact of transformations on data types, especially when formatting numbers as strings