Find Faulty Lines in a CSV File
This section explains how to use awk
to find faulty lines in a CSV file, specifically checking if each row has the same number of fields as the first row.
Interactive Command
Find Faulty Lines Command
awk -F "," '{
if (NR == 1) {
expected_fields = NF
}
if (NF != expected_fields) {
print "Row " NR " is faulty with " NF-1 " delimiters."
}
}' sample.csv
Breakdown
awk -F "," '{...}' sample.csv
:- This part runs the
awk
command on the input file, where the code inside{}
is applied to each line of the file. -F ","
sets the field separator to a comma (customize this in the interactive command).
- This part runs the
if (NR == 1) { expected_fields = NF }
:NR
is the current row number. This condition checks if it's the first row.NF
is the number of fields in the current row.- For the first row, we store the number of fields as the expected number for all rows.
if (NF != expected_fields)
:- This checks if the current row has a different number of fields than expected.
print "Row " NR " is faulty..."
:- If a row is faulty, this prints a message indicating the row number and the number of delimiters found.
Example
Let's see an example of how this command works:
Find Faulty Lines Example
awk -F "," '{
if (NR == 1) {
expected_fields = NF
}
if (NF != expected_fields) {
print "Row " NR " is faulty with " NF-1 " delimiters."
}
}' sample.csv
This command processes the sample CSV file to find faulty lines.
Sample Output:
Row 3 is faulty with 2 delimiters.
Row 4 is faulty with 4 delimiters.
Assuming the sample.csv
file contains:
Name,Age,City,Country
John,30,New York,USA
Alice,25,London
Bob,35,Paris,France,Engineer
Eva,28,Berlin,Germany
As you can see, the command identified two faulty rows:
- Row 3 is faulty because it has only 3 fields (2 delimiters) instead of the expected 4 fields (3 delimiters).
- Row 4 is faulty because it has 5 fields (4 delimiters) instead of the expected 4 fields (3 delimiters).
note
This script compares all rows to the first row to check if they have the same number of fields.