Skip to main content

Find Faulty Lines in a CSV File

This section explains how to use awk to find faulty lines in a CSV file, specifically checking if each row has the same number of fields as the first row.

Interactive Command

Find Faulty Lines Command
awk -F "," '{
if (NR == 1) {
expected_fields = NF
}
if (NF != expected_fields) {
print "Row " NR " is faulty with " NF-1 " delimiters."
}
}' sample.csv

Breakdown

  1. awk -F "," '{...}' sample.csv:
    • This part runs the awk command on the input file, where the code inside {} is applied to each line of the file.
    • -F "," sets the field separator to a comma (customize this in the interactive command).
  2. if (NR == 1) { expected_fields = NF }:
    • NR is the current row number. This condition checks if it's the first row.
    • NF is the number of fields in the current row.
    • For the first row, we store the number of fields as the expected number for all rows.
  3. if (NF != expected_fields):
    • This checks if the current row has a different number of fields than expected.
  4. print "Row " NR " is faulty...":
    • If a row is faulty, this prints a message indicating the row number and the number of delimiters found.

Example

Let's see an example of how this command works:

Find Faulty Lines Example
awk -F "," '{
if (NR == 1) {
expected_fields = NF
}
if (NF != expected_fields) {
print "Row " NR " is faulty with " NF-1 " delimiters."
}
}' sample.csv

This command processes the sample CSV file to find faulty lines.

Sample Output:

Row 3 is faulty with 2 delimiters. Row 4 is faulty with 4 delimiters.

Assuming the sample.csv file contains:

Name,Age,City,Country John,30,New York,USA Alice,25,London Bob,35,Paris,France,Engineer Eva,28,Berlin,Germany

As you can see, the command identified two faulty rows:

  • Row 3 is faulty because it has only 3 fields (2 delimiters) instead of the expected 4 fields (3 delimiters).
  • Row 4 is faulty because it has 5 fields (4 delimiters) instead of the expected 4 fields (3 delimiters).
note

This script compares all rows to the first row to check if they have the same number of fields.