Sort by Library of Congress call number in Perl
In redesigning my book collection page this evening, I ran across the need for a routine to sort by Library of Congress call number. This is actually nontrivial, as the following are all valid numbers:
- DA870.F64
- DK602.3.B76 1996
- Q335.P416 1994
- QA76.73.P22W35 1991
- RS75.P5
To add even more complexity, some number fields are sorted in strict ascending order (e.g., in “DK602.3.B76 1996″ the bold number would come after 9, after 80, after 600 but before 603) and some are sorted as decimals (e.g., in “Q335.P416 1994″ the bold number would come after 3000 and after 35 but before 4161.) I wrote some Perl code for this, and it understands all call number forms that I am aware of. If you stumbled upon this page looking for something like this, here it is:
sub locsort ($a,$b)
{
@a = ($a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);
@b = ($b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)(?: (\d\d\d\d))?/);
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
"0.$a[3]" <=> "0.$b[3]"
||
$a[4] cmp $b[4]
||
"0.$a[5]" <=> "0.$b[5]"
||
$a[6] <=> $b[6]
;
}














August 14th, 2006 at 14h52
Thanks for posting this code!! I had a little trouble getting it to work, but then I figured out that the backslashes before the d’s didn’t show up in your code sample. In any case, here’s the script as I got it to work, with a little extra context. I added another element to the sort, because sometimes our LC numbers have additions after the year. I’ve included some of these in the example. Now…will the backslashes show?
Geoff
>>>>
my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);
$beforeList = join (”\n”, @lcList);
print “Before:\n$beforeList\n\n”;
my @result = sort by_lc_number @lcList;
$afterList = join (”\n”, @result);
print “\nAfter:\n$afterList\n\n”;
sub by_lc_number {
# print “A passed: $a\nB passed: $b\n”;
$a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@a = ($1,$2,$3,$4,$5,$6,$7,$8);
$b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@b = ($1,$2,$3,$4,$5,$6,$7,$8);
# $resultA = join (”::”, @a);
# $resultB = join (”::”, @b);
# print “A parsed: $resultA\nB parsed: $resultB\n”;
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
“0.$a[3]” <=> “0.$b[3]”
||
$a[4] cmp $b[4]
||
“0.$a[5]” <=> “0.$b[5]”
||
$a[6] <=> $b[6]
||
$a[7] cmp $b[7]
;
}
August 14th, 2006 at 14h55
Thanks!! You’re missing the backslashes before the d’s. Did WordPress strip them out?
August 14th, 2006 at 14h58
Here’s an example with a little more context. Hopefully it’s useful for newbies! I’ve added one more element to the sort, because our library sometimes makes additions after the year.
>>>>
my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);
$beforeList = join (”\n”, @lcList);
print “Before:\n$beforeList\n\n”;
my @result = sort by_lc_number @lcList;
$afterList = join (”\n”, @result);
print “\nAfter:\n$afterList\n\n”;
sub by_lc_number {
# print “A passed: $a\nB passed: $b\n”;
$a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@a = ($1,$2,$3,$4,$5,$6,$7,$8);
$b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
@b = ($1,$2,$3,$4,$5,$6,$7,$8);
# $resultA = join (”::”, @a);
# $resultB = join (”::”, @b);
# print “A parsed: $resultA\nB parsed: $resultB\n”;
return
$a[0] cmp $b[0]
||
$a[1] <=> $b[1]
||
$a[2] cmp $b[2]
||
“0.$a[3]” <=> “0.$b[3]”
||
$a[4] cmp $b[4]
||
“0.$a[5]” <=> “0.$b[5]”
||
$a[6] <=> $b[6]
||
$a[7] cmp $b[7]
;
}
August 14th, 2006 at 16h10
Yes, it sure did. It stripped out the backslashes before the dots, too. Thanks for catching that.
July 27th, 2007 at 07h53
I’m a newbie looking forward to making this work. If it does it’ll save me a ton of time at my library.
Question: Shouldn’t the phrase:
# print “A passed: $a\nB passed: $b\n”;
use ‘parsed’ instead of ‘passed’ ?
or conversely for the subsequent phrase
# print “A parsed: $resultA\nB parsed: $resultB\n”;
Perhaps I don’t yet understand enough about Perl.
TC
July 28th, 2007 at 18h37
Tom, I don’t think so. The first line, commented out, shows what was “passed” into the script. The second line, also commented out, shows what one gets after the line is chopped up, or “parsed”. But that’s Geoff’s modification of my code, not mine.
July 30th, 2007 at 06h11
I get it…sorry…like I said, a newbie. Thanks for responding.
TC
August 12th, 2008 at 12h36
[...] needed a function that will compare Library of Congress call numbers. I found a function in PERL to do this on Joshua McGees’s site, but I needed it in PHP and our call numbers use a space instead of a period for some of the [...]
May 17th, 2009 at 22h04
An LC Call No.is composed of a “call letter” [A-Z]+ followed optionally by a call number [0-9]+(\.[0-9]+)?, followed by zero or more cutter numbers ([A-Z]+[0-9]+). Everything is sorted ASCII-betically except the integer part of the call number, which is sorted numerically. Therefore, you can just zero-pad the integer part of the call number, and sort the resulting strings using Perl’s built-in cmp operator. (The code below does some extra work to strip non-letters and non-digits, but this shouldn’t matter if all your LC numbers are properly formatted to begin with.)
my @lc_list = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);
@lc_list = lc_sort(@lc_list);
print map “$_\n”, @lc_list;
sub lc_sort {
my @sorted = sort map make_sortable_lc($_).”=”.$_, @_;
foreach (@sorted) {
s/^[^=]*=//;
}
return @sorted;
}
sub make_sortable_lc {
local ($_) = @_;
s/([0-9])[.]([0-9])/$1:$2/;
s/[^:A-Z0-9]//g;
s/(.):(.)/$1.$2/;
s/^([A-Z]*)([0-9]+)/sprintf “%s%09d”, $1, $2/e;
return $_;
}
May 22nd, 2009 at 12h25
Well, Josh, I suppose if you wanted to be “efficient” or “elegant” or “clever” about it, you could use your method.
(Thanks. I’m feeling a bit foolish for my original solution.)