Text::CSV::Hash - hash based CSV file handing, including comparisons
Text::CSV::Hash provides simple hash based methods for loading, saving, and comparing CSV files. Will handle standard quoting and entries with newlines.
config()
lookup()
for more details. Load will fail if any are missing.
labels()
labels_add(@labels)
save()
etc.
labels_set(@labels)
save()
etc.
The first line of the file is expected to contain column labels, except where overridden by name_row_offset or skip_blank_lines.
Returns undef on error and error value can be displayed with $csv->error().
lookup()
is intended to handle the case where some fields may
mismatch, and multiple match()
calls may be needed to find a suitable
row. If it is unable to match all fields in $lookupscores it will
cycle through all possible combnations of fields in $lookupscores
starting with the highest possible score and working down. The
first combination that matches will be used. Any match with a score
lower than $minscore will be rejected. For example, if the csv file
has the fields 'first name', 'last name', 'country' and you are
willing to accept a mismatch of either 'first name' or 'country',
you could use
$lookupscores = { 'last name' => 2, 'first name' => 1, 'country' => 1 }; $minscore = 3;
Lookup returns a list of three items ($match, $ambig, $mismatch);
If you do not care about details of ambiguous matches or mismatched fields you can treat them as boolean or just ignore them, as in ($match) = $csv->lookup($row);
match()
builds
a hash cache internally to speedup repeated lookups on the same csv object.
If the returned list has one entry it is an exact match, more than one indicates an ambiguous match, and zero no matches.
For more sophisticated matching see lookup().
rows()
Returns undef on error and error value can be displayed with $csv->error().
If present, $conf can specify characteristics of the file. Note these will default to those set when the file was loaded.
# Sample script to load file.csv, double the 'cost' columns, and save my($csv);
$csv = new Text::CSV::Hash;
$csv->load('file.csv') || die $csv->error(); foreach my $row (@{$csv->rows()}) { $row->{cost} *= 2; } $csv->save() || die $csv->error();
# Sample script to produce merged versions of file1.csv and file2.csv # - matching on 'first name', 'last name', and 'country' columns # - add two columns 'results', and 'mismatches' # - allowing for any one of the three fields to mismatch # - trimming spaces from each field.
my($csv1, $csv2, %merge_match);
$csv1 = new Text::CSV::Hash; $csv2 = new Text::CSV::Hash;
%merge_match = ('first name' => 1, 'last name' => 1, 'country' => 1);
$csv1->load('file1.csv', {trim => 1}, \%lookups) || die $csv1->error(); $csv2->load('file2.csv', {trim => 1}, \%lookups) || die $csv2->error();
$csv2->labels_add('results', 'mismatches');
foreach my $row2 (@{$csv2->rows()}) { my($match1, $ambig, $mismatches, $resfield, $misfield);
($match1, $ambig, $mismatches) = $csv1->lookup($row2, 2); if ($ambig) { $resfield = 'Ambiguous'; } elsif (!$match1) { $resfield = 'Not Found'; } elsif ($mismatches) { $resfield = "Mismatch ".join(' + ', @${mismatches}); foreach my $mis (@${mismatches}) { $misfield .= qq#("$row2->{$mis}" - "$match1->{$mis}") #; } chop $misfield; } else { $resfield = 'Match all'; }
$row2->{results} = $resfield; $row2->{mismatches} = $misfield;
if ($match1) { # Merge fields both ways
foreach($csv1->labels()) { $row2->{$_} ||= $match1->{$_}; }
foreach($csv2->labels()) { $match1->{$_} ||= $csv2->{$_}; } } }
$csv1->labels_add($csv2->labels()); $csv1->save('merged_'.$csv1->{filename}) || die $csv1->error();
$csv2->labels_add($csv1->labels()); $csv2->save('merged_'.$csv2->{filename}) || die $csv2->error();