Sort by Library of Congress call number in Perl
In redesigning my book collection page this evening, I ran across the need for a routine to sort by Library of Congress call number. This is actually nontrivial, as the following are all valid numbers:
- DA870.F64
- DK602.3.B76 1996
- Q335.P416 1994
- QA76.73.P22W35 1991
- RS75.P5
To add even more complexity, some number fields are sorted in strict ascending order (e.g., in “DK602.3.B76 1996″ the bold number would come after 9, after 80, after 600 but before 603) and some are sorted as decimals (e.g., in “Q335.P416 1994″ the bold number would come after 3000 and after 35 but before 4161.) I wrote some Perl code for this, and it understands all call number forms that I am aware of. If you stumbled upon this page looking for something like this, here it is:
sub locsort ($a,$b)
{
@a = ($a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);
@b = ($b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
"0.$a[3]" <=> "0.$b[3]"
||
$a[4] cmp $b[4]
||
"0.$a[5]" <=> "0.$b[5]"
||
$a[6] <=> $b[6]
;
}


















August 14th, 2006 at 2:52 pm
Thanks for posting this code!! I had a little trouble getting it to work, but then I figured out that the backslashes before the d’s didn’t show up in your code sample. In any case, here’s the script as I got it to work, with a little extra context. I added another element to the sort, because sometimes our LC numbers have additions after the year. I’ve included some of these in the example. Now…will the backslashes show?
Geoff
>>>>
my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);
$beforeList = join (”\n”, @lcList);
print “Before:\n$beforeList\n\n”;
my @result = sort by_lc_number @lcList;
$afterList = join (”\n”, @result);
print “\nAfter:\n$afterList\n\n”;
sub by_lc_number {
# print “A passed: $a\nB passed: $b\n”;
$a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@a = ($1,$2,$3,$4,$5,$6,$7,$8);
$b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@b = ($1,$2,$3,$4,$5,$6,$7,$8);
# $resultA = join (”::”, @a);
# $resultB = join (”::”, @b);
# print “A parsed: $resultA\nB parsed: $resultB\n”;
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
“0.$a[3]” <=> “0.$b[3]”
||
$a[4] cmp $b[4]
||
“0.$a[5]” <=> “0.$b[5]”
||
$a[6] <=> $b[6]
||
$a[7] cmp $b[7]
;
}
August 14th, 2006 at 2:55 pm
Thanks!! You’re missing the backslashes before the d’s. Did WordPress strip them out?
August 14th, 2006 at 2:58 pm
Here’s an example with a little more context. Hopefully it’s useful for newbies! I’ve added one more element to the sort, because our library sometimes makes additions after the year.
>>>>
my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);
$beforeList = join (”\n”, @lcList);
print “Before:\n$beforeList\n\n”;
my @result = sort by_lc_number @lcList;
$afterList = join (”\n”, @result);
print “\nAfter:\n$afterList\n\n”;
sub by_lc_number {
# print “A passed: $a\nB passed: $b\n”;
$a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@a = ($1,$2,$3,$4,$5,$6,$7,$8);
$b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@b = ($1,$2,$3,$4,$5,$6,$7,$8);
# $resultA = join (”::”, @a);
# $resultB = join (”::”, @b);
# print “A parsed: $resultA\nB parsed: $resultB\n”;
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
“0.$a[3]” <=> “0.$b[3]”
||
$a[4] cmp $b[4]
||
“0.$a[5]” <=> “0.$b[5]”
||
$a[6] <=> $b[6]
||
$a[7] cmp $b[7]
;
}
August 14th, 2006 at 4:10 pm
Yes, it sure did. It stripped out the backslashes before the dots, too. Thanks for catching that.
July 27th, 2007 at 7:53 am
I’m a newbie looking forward to making this work. If it does it’ll save me a ton of time at my library.
Question: Shouldn’t the phrase:
# print “A passed: $a\nB passed: $b\n”;
use ‘parsed’ instead of ‘passed’ ?
or conversely for the subsequent phrase
# print “A parsed: $resultA\nB parsed: $resultB\n”;
Perhaps I don’t yet understand enough about Perl.
TC
July 28th, 2007 at 6:37 pm
Tom, I don’t think so. The first line, commented out, shows what was “passed” into the script. The second line, also commented out, shows what one gets after the line is chopped up, or “parsed”. But that’s Geoff’s modification of my code, not mine.
July 30th, 2007 at 6:11 am
I get it…sorry…like I said, a newbie. Thanks for responding.
TC
August 12th, 2008 at 12:36 pm