Sort by Library of Congress call number in Perl

In redesigning my book collection page this evening, I ran across the need for a routine to sort by Library of Congress call number.  This is actually nontrivial, as the following are all valid numbers:

  • DA870.F64
  • DK602.3.B76 1996
  • Q335.P416 1994
  • QA76.73.P22W35 1991
  • RS75.P5

To add even more complexity, some number fields are sorted in strict ascending order (e.g., in “DK602.3.B76 1996″ the bold number would come after 9, after 80, after 600 but before 603) and some are sorted as decimals (e.g., in “Q335.P416 1994″ the bold number would come after 3000 and after 35 but before 4161.)  I wrote some Perl code for this, and it understands all call number forms that I am aware of.  If you stumbled upon this page looking for something like this, here it is:

sub locsort ($a,$b)
{
    @a = ($a =~ /^([A-Z]+)(\\d+(?:\\.\\d+)?)\\.?([A-Z]*)(\\d*)\\.?([A-Z]*)(\\d*)(?: (\\d\\d\\d\\d))?/);
    @b = ($b =~ /^([A-Z]+)(\\d+(?:\\.\\d+)?)\\.?([A-Z]*)(\\d*)\\.?([A-Z]*)(\\d*)(?: (\\d\\d\\d\\d))?/);

    return
	$a[0] cmp $b[0]
	    ||
	$a[1] <=> $b[1]
	    ||
	$a[2] cmp $b[2]
	    ||
	"0.$a[3]" <=> "0.$b[3]"
	    ||
	$a[4] cmp $b[4]
	    ||
	"0.$a[5]" <=> "0.$b[5]"
	    ||
	$a[6] <=> $b[6]
	    ;
}
Your ad here for US$1/month.  Find out how.


7 Responses to “Sort by Library of Congress call number in Perl”

  1. Geoff Sinclair Says:

    Thanks for posting this code!! I had a little trouble getting it to work, but then I figured out that the backslashes before the d’s didn’t show up in your code sample. In any case, here’s the script as I got it to work, with a little extra context. I added another element to the sort, because sometimes our LC numbers have additions after the year. I’ve included some of these in the example. Now…will the backslashes show?

    Geoff
    >>>>

    my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);

    $beforeList = join (”\n”, @lcList);
    print “Before:\n$beforeList\n\n”;

    my @result = sort by_lc_number @lcList;
    $afterList = join (”\n”, @result);

    print “\nAfter:\n$afterList\n\n”;

    sub by_lc_number {

    # print “A passed: $a\nB passed: $b\n”;

    $a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
    @a = ($1,$2,$3,$4,$5,$6,$7,$8);
    $b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
    @b = ($1,$2,$3,$4,$5,$6,$7,$8);

    # $resultA = join (”::”, @a);
    # $resultB = join (”::”, @b);
    # print “A parsed: $resultA\nB parsed: $resultB\n”;

    return
    $a[0] cmp $b[0]
        ||
    $a[1] <=> $b[1]
        ||
    $a[2] cmp $b[2]
        ||
    “0.$a[3]” <=> “0.$b[3]”
        ||
    $a[4] cmp $b[4]
        ||
    “0.$a[5]” <=> “0.$b[5]”
        ||
    $a[6] <=> $b[6]
        ||
    $a[7] cmp $b[7]
        ;
    }

  2. Geoff Sinclair Says:

    Thanks!! You’re missing the backslashes before the d’s. Did WordPress strip them out?

  3. Geoff Sinclair Says:

    Here’s an example with a little more context. Hopefully it’s useful for newbies! I’ve added one more element to the sort, because our library sometimes makes additions after the year.

    >>>>

    my @lcList = (’DK602.3.B76 1996′, ‘Q335.P416 1994′, ‘DK602.3.B76 1996a -text’, ‘QA76.73.P22W35 1991′, ‘RS75.P5′, ‘DA870.F64′, ‘DK602.3.B76 1996b -disc’);

    $beforeList = join (”\n”, @lcList);
    print “Before:\n$beforeList\n\n”;

    my @result = sort by_lc_number @lcList;
    $afterList = join (”\n”, @result);

    print “\nAfter:\n$afterList\n\n”;

    sub by_lc_number {

    # print “A passed: $a\nB passed: $b\n”;

    $a =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
    @a = ($1,$2,$3,$4,$5,$6,$7,$8);
    $b =~ /^([A-Z]+)(\d+(?:\.\d+)?)\.?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/;
    @b = ($1,$2,$3,$4,$5,$6,$7,$8);

    # $resultA = join (”::”, @a);
    # $resultB = join (”::”, @b);
    # print “A parsed: $resultA\nB parsed: $resultB\n”;

    return
    $a[0] cmp $b[0]
        ||
    $a[1] <=> $b[1]
        ||
    $a[2] cmp $b[2]
        ||
    “0.$a[3]” <=> “0.$b[3]”
        ||
    $a[4] cmp $b[4]
        ||
    “0.$a[5]” <=> “0.$b[5]”
        ||
    $a[6] <=> $b[6]
        ||
    $a[7] cmp $b[7]
        ;
    }

  4. Joshua (Site Owner) Says:

    Yes, it sure did.  It stripped out the backslashes before the dots, too.  Thanks for catching that.

  5. Tom Clark Says:

    I’m a newbie looking forward to making this work.  If it does it’ll save me a ton of time at my library.

    Question:  Shouldn’t the phrase:

          # print “A passed: $a\nB passed: $b\n”;

    use ‘parsed’ instead of ‘passed’ ? 

    or conversely for the subsequent phrase

          # print “A parsed: $resultA\nB parsed: $resultB\n”;

    Perhaps I don’t yet understand enough about Perl.

    TC

  6. Joshua (Site Owner) Says:

    Tom, I don’t think so.  The first line, commented out, shows what was “passed” into the script.  The second line, also commented out, shows what one gets after the line is chopped up, or “parsed”.  But that’s Geoff’s modification of my code, not mine.

  7. Tom Clark Says:

    I get it…sorry…like I said, a newbie.  Thanks for responding.

    TC

Leave a Reply, but read first

  1. Feel free to leave replies even to very old posts.
  2. You have pretty much free rein to write whatever you like.  Just make it contentful and it will probably stay, even if you are abusing me.  Just:
  3. Don't bother spamming.  Your links are automatically tagged "nofollow".  You won't increase your Google rating.  Nobody will click them anyway.  Save us both some time.
  4. Advertising Policy: The URL field is for personal blogs, not commercial enterprises.  Have a valid website or product to advertise?  Those do get clicked, and it's cheap.  Click here to advertise.  Otherwise, your URL is subject to deletion at editor's discretion.