CPAN Pull Request Challenge for February: Archive::BagIt

About the challenge

See for more information my post about January.

My assignment for February

This month my assignment is Archive::BagIt. I never heard of BagIt before, but according to the Wikipedia page it is a format used by libraries to archive information and to be able to retrieve it later.

I decided to check out the module to see what it’s about. I think it’s probably a nice module if you’re in need of the BagIt format; which I’ve not been so far.

When I wrote about January’s assignment I already mentioned that typically I make pull requests because I have an itch to scratch, I run into a small bug or an issue I’d like to get solved. With this CPAN assignment I have to look for stuff that is broken or can be improved in order to make a pull request. This is also OK, but it’s just… different. It also seems to lead to a noticeably larger amount of pull requests to all cpan modules in total, which is nice!

So basically this means there is a lot of what I call ‘CPAN grooming’ going on. Many smaller improvements are being made to the modules out there on the CPAN.

When I checked out the module I noticed the code samples did not render properly on MetaCPAN.org because of issues with the POD. I found that very annoying, and decided fixing that would be my pull request. When I submitted my work to Neil, who runs the pull request challenge administration, he suggested I could do more changes, which I probably could. But then again, for me the challenge was not about making as many pull requests as possible, but to explore new modules on CPAN and see if I could keep up with this challenge. Actually, I added one more pull request just for fun. So I guess Neils strategy worked, he tricked me into it!

The ‘receiving’ end

In January, I received a pull request for the File::MimeInfo module. I maintain this module since two years or so. The PR fixed some typos in the README; I merged it. I’m still awaiting on contributions to the DBD::mysql module which was assigned both in January and in February.

My pull request colleagues

I have the idea that the communication on the mailing list and the IRC channel is not as lively as it was in January. This is only natural, because many participants had questions about the procedure and this is more clear now.

But if you have got an assignment for February and have not got around to making you contribution yet, don’t fear! you’ll have about a week left!

Next months

I’m looking forward to my next assignments. So far it’s been fun!

New tar, paxheaders and installing from CPAN

I work most often with CentOS or Red Hat Enterprise Linux (RHEL) platforms, which are widely deployed in enterprise environments. Last year version 7 was released, and while it’s great it has some features like systemd that scares some sysadmins off. This means the version most widely used is version 6, which comes with perl 5.10.1.

Recently, I wanted to install SOAP::Lite using CPAN because I wanted the latest version – it’s pretty simple to install version 0.710 from repositories using

sudo yum install perl-SOAP-Lite

But anyway I wanted a new feature so I decided to install from CPAN.

I got this error message:

$ cpan SOAP::Lite
CPAN: Archive::Tar loaded ok (v1.58)
PaxHeader/SOAP-Lite-1.13
SOAP-Lite-1.13/
SOAP-Lite-1.13/PaxHeader/bin
....
CPAN: File::Temp loaded ok (v0.22)
CPAN: Time::HiRes loaded ok (v1.9721)
Package seems to come without Makefile.PL.
  (The test -f "/home/vagrant/.cpan/build/PHRED-wxPcgc/Makefile.PL" returned false.)
  Writing one on our own (setting NAME to SOAPLite)

  CPAN.pm: Going to build P/PH/PHRED/SOAP-Lite-1.13.tar.gz

Checking if your kit is complete...
Looks good
Warning: prerequisite Class::Inspector 0 not found.
Warning: prerequisite Crypt::SSLeay 0 not found.
Warning: prerequisite IO::SessionData 1.03 not found.
Warning: prerequisite IO::Socket::SSL 0 not found.
Warning: prerequisite Task::Weaken 0 not found.
Bareword found where operator expected at ./Makefile.PL line 1, near "17 gid"
    (Missing operator before gid?)
Number found where operator expected at ./Makefile.PL line 2, near "18"
    (Missing semicolon on previous line?)
....

That looks strange, right? Especially since version 1.12 of the SOAP::Lite module installs perfectly fine. Do you notice the message “Package seems to come without Makefile.PL”?

The problem here is that the archive is built with a ‘newer’ tar, using the ‘extended headers’ which are described in POSIX.1-2001. Yeah that’s right – these extended headers were standardized in 2001, have not found their way into RHEL6, but are available in newer tars on OS X and Linux now. The only problem is, they aren’t really backwards compatible.

The cpan on my CentOS machine expands the archive, finds all kind of headers it does not know what to do with, and creates separate directories called PaxHeaders.

It puts files in these directories with the extended headers; one is called Makefile.PL and cpan tries to execute ALL Makefile.PLs it finds, however the file containing the extended headers is not valid perl and that is the actual error you see above.

The directory structure created looks like below, you can inspect it by typing ‘look SOAP::Lite’ in your cpan client on RHEL6:

Makefile.PL  <-- this is actually created by cpan
PaxHeader
  |-- SOAP-Lite-1.13  <-- this contains extended headers
SOAP-Lite-1.13
  |-- bin
  |-- lib
  |-- Makefile.PL  <-- actual Makefile.PL
  | ...
  |-- PaxHeader  <-- all files below only contain extended headers
        |-- bin
        |-- lib
        |-- Makefile.PL # this is not valid perl!

How widespread is this issue?

Well of course if it’s ONLY SOAP::Lite that is affected by this, it would not be so bad. I wrote a little script and let it loose on my offline CPAN mirror, ran it on all dists from authors starting with ‘A’ and found hundreds of tarballs that had this issue. I did not bother running it on the complete CPAN but I guess you can say a pretty considerable portion of CPAN would have the ‘new’ tar archives.

What can you do as a CPAN author?

I asked the author of SOAP::Lite, Fred Moyer, if he did anything special or changed anything to his setup between the 1.12 and 1.13 release. He said he did not think anything particular should have changed. He just used his OS X laptop and build chain and out came the extended headers.

Tux wrote on perlmonks about this, he ran into the issue on his openSUSE laptop, and has a nice solution: add some tarflags to Makefile.PL like below

use ExtUtils::MakeMaker;
WriteMakefile (
    ABSTRACT     => "Comma-Separated Values manipulation routines",
    VERSION_FROM => "CSV_XS.pm",
    macro        => { TARFLAGS   => "--format=ustar -c -v -f",
                    },
    );

Workaround for cpan users

There are actually two possible workarounds for this:

 

Upgrading Archive::Tar

The cpan client extracts the tarballs using the module Archive::Tar. If you’d upgrade Archive::Tar using cpan, it will handle the archives properly and you can install SOAP::Lite properly. Of course this would only work as long as the author of Archive::Tar does not upload its module with extended headers.

Upgrading cpan

If you’d upgrade your cpan module, it will start using /usr/bin/tar instead of the perl module Archive::Tar for the extracting. /usr/bin/tar on RHEL 6 actually ignores the extended headers and will complain loudly about them, but it DOES work. Upgrading cpan also only will work as long as the new tarball itself is not created with extended headers!

Resolution upstream

Being a good citizen I thought it might be helpful to file this issue at Red Hat, it’s #1184194  – it was very swiftly handled by backporting a fix from Archive::Tar so it ignores PaxHeader files; that’s great! It’s in QA now for a little over three weeks. I’m not sure exactly about how this would progress, but hopefully it’ll be released and you can just yum upgrade and have this issue done with soon!