DBD::mysql – even more utf8-issues fixed, super CI

In my previous post I explained how many utf-8 related issues are now fixed in the latest DBD::mysql development release, and I asked for feedback and testing from our users.

Even better utf-8 handling

I’m happy to inform you that, thanks to your feedback, we’ve now released yet another development version, with even more UTF8 issues fixed. Tanabe Yoshinori reported that column names and database warnings were not properly encoded and prolific contributor Pali Rohár fixed that issue, and much more.

Serialization issue under taint mode

One other issue which was reported by amavisd users has to do with how perl stores values internally under taint mode, which is the default way amavisd is run. This is now fixed, see for more information https://github.com/perl5-dbi/DBD-mysql/issues/78.

Extreme CI testing setup

And Pali also modified our Travis setup so we now do continuous integration testing not only on many different perl versions, but also using many different MySQL and MariaDB versions. This uncovered many smaller and larger issues and we can compile again all the way back with MySQL against version 4 if you would want that (and we had multiple people wanting that and filing bug reports for this in the past).

This also lead to Pali discovering a use-after-free security issue (CVE-2017-3302) in libmysqlclient which was fixed in MySQL 5.6 and up but still present in 5.5 and also in MariaDB. See for more info this thread on oss-security.

Your feedback is welcome!

Find the full change log below; when all is well on Wednesday 8th of March we’ll release the stable version 4.042, including all these changes, as well as the changes from the previous post.

You can leave your feedback via the DBI-users mailing list, or using our GitHub page.

2017-02-28 change log of version 4.041_2)

DBD::mysql – all your UTF-8 bugs are belong to us!!□□

After a couple of years of more or less “maintenance mode” on DBD::mysql – we had a hand full of people contributing occasional fixes and a whole slew of drive-by contributors – we now have a prolific contributor again: Pali Rohár.

It’s great to see some more long-standing issues taken care of!

This time around, in the new development release 4.041_01 that is on CPAN now (https://metacpan.org/release/MICHIELB/DBD-mysql-4.041_01), there are some important fixes for some Unicode-related issues that I would like to point out. The sections below I have distilled based on the descriptions made by Pali.

Automatically converting to UTF-8 for bind parameters

Before this release perl scalars (statements or bind parameters) without UTF8 status flag were not encoded to UTF-8 even if mysql_enable_utf8 was enabled. This caused perl scalars with internal Latin1 encoding to be sent to the mysql server as Latin1 even if mysql_enable_utf8 was enabled.

Now all statements and bind parameters which are not a DBI binary type (SQL_BIT, SQL_BLOB, SQL_BINARY, SQL_VARBINARY or SQL_LONGVARBINARY) are automatically encoded to UTF-8 when mysql_enable_utf8 is enabled.

If mysql_enable_utf8 is not enabled and your statement or bind parameter contains a wide Unicode character then DBD::mysql shows a warning. If a binary parameter contains a wide Unicode character then DBD::mysql shows a warning too, similar like function print without using a :utf8 perlio layer. (“Wide character in…”)

Perl’s SvPV() returns char* from a perl scalar and the following SvUTF8() call for that scalar returns true if SvPV returned the data in UTF-8 or Latin1.

Decoding of UTF-8 fields when mysql_enable_utf8 is enabled

For each fetched field mysql server tells us its charset id. Before this release when mysql_enable_utf8 was enabled DBD::mysql UTF-8 decoded all fields with a charset id different than 63 (which means binary).

Now DBD::mysql UTF-8 decodes only those fields which have their charset set to utf8 or utf8mb4. By default mysql server sends data in encoding specified by SET NAMES command, which is by default Latin1. So any received Latin1 data is not UTF-8 decoded anymore.

The mysql server sends a charset id, not a charset name. Each combination of charset name and collation pairs has its own charset id. A new function charsetnr_is_utf8() has hardcoded all utf8 and utf8mb4 charset ids from mysql (up to 8.0.0) and mariadb (up to 10.2.2) from their source code. So far it looks like those ids are not changing since old mysql 5.0, only new ones are added.

Conclusion

We hope these changes make DBD::mysql a lot more consistent for you. Since the changes are rather big, we’d urge you to test the development release 4.041_01 which is on CPAN and give feedback NOW; this allows us to make changes if needed before we create an actual stable release with these features.

And of course, if you test it with your software and all is good, we’d like to hear that as well!

You can leave your feedback via the DBI-users mailing list, or using our GitHub page.

CPAN Pull Request Challenge for February: Archive::BagIt

About the challenge

See for more information my post about January.

My assignment for February

This month my assignment is Archive::BagIt. I never heard of BagIt before, but according to the Wikipedia page it is a format used by libraries to archive information and to be able to retrieve it later.

I decided to check out the module to see what it’s about. I think it’s probably a nice module if you’re in need of the BagIt format; which I’ve not been so far.

When I wrote about January’s assignment I already mentioned that typically I make pull requests because I have an itch to scratch, I run into a small bug or an issue I’d like to get solved. With this CPAN assignment I have to look for stuff that is broken or can be improved in order to make a pull request. This is also OK, but it’s just… different. It also seems to lead to a noticeably larger amount of pull requests to all cpan modules in total, which is nice!

So basically this means there is a lot of what I call ‘CPAN grooming’ going on. Many smaller improvements are being made to the modules out there on the CPAN.

When I checked out the module I noticed the code samples did not render properly on MetaCPAN.org because of issues with the POD. I found that very annoying, and decided fixing that would be my pull request. When I submitted my work to Neil, who runs the pull request challenge administration, he suggested I could do more changes, which I probably could. But then again, for me the challenge was not about making as many pull requests as possible, but to explore new modules on CPAN and see if I could keep up with this challenge. Actually, I added one more pull request just for fun. So I guess Neils strategy worked, he tricked me into it!

The ‘receiving’ end

In January, I received a pull request for the File::MimeInfo module. I maintain this module since two years or so. The PR fixed some typos in the README; I merged it. I’m still awaiting on contributions to the DBD::mysql module which was assigned both in January and in February.

My pull request colleagues

I have the idea that the communication on the mailing list and the IRC channel is not as lively as it was in January. This is only natural, because many participants had questions about the procedure and this is more clear now.

But if you have got an assignment for February and have not got around to making you contribution yet, don’t fear! you’ll have about a week left!

Next months

I’m looking forward to my next assignments. So far it’s been fun!

New tar, paxheaders and installing from CPAN

I work most often with CentOS or Red Hat Enterprise Linux (RHEL) platforms, which are widely deployed in enterprise environments. Last year version 7 was released, and while it’s great it has some features like systemd that scares some sysadmins off. This means the version most widely used is version 6, which comes with perl 5.10.1.

Recently, I wanted to install SOAP::Lite using CPAN because I wanted the latest version – it’s pretty simple to install version 0.710 from repositories using

sudo yum install perl-SOAP-Lite

But anyway I wanted a new feature so I decided to install from CPAN.

I got this error message:

$ cpan SOAP::Lite
CPAN: Archive::Tar loaded ok (v1.58)
PaxHeader/SOAP-Lite-1.13
SOAP-Lite-1.13/
SOAP-Lite-1.13/PaxHeader/bin
....
CPAN: File::Temp loaded ok (v0.22)
CPAN: Time::HiRes loaded ok (v1.9721)
Package seems to come without Makefile.PL.
  (The test -f "/home/vagrant/.cpan/build/PHRED-wxPcgc/Makefile.PL" returned false.)
  Writing one on our own (setting NAME to SOAPLite)

  CPAN.pm: Going to build P/PH/PHRED/SOAP-Lite-1.13.tar.gz

Checking if your kit is complete...
Looks good
Warning: prerequisite Class::Inspector 0 not found.
Warning: prerequisite Crypt::SSLeay 0 not found.
Warning: prerequisite IO::SessionData 1.03 not found.
Warning: prerequisite IO::Socket::SSL 0 not found.
Warning: prerequisite Task::Weaken 0 not found.
Bareword found where operator expected at ./Makefile.PL line 1, near "17 gid"
    (Missing operator before gid?)
Number found where operator expected at ./Makefile.PL line 2, near "18"
    (Missing semicolon on previous line?)
....

That looks strange, right? Especially since version 1.12 of the SOAP::Lite module installs perfectly fine. Do you notice the message “Package seems to come without Makefile.PL”?

The problem here is that the archive is built with a ‘newer’ tar, using the ‘extended headers’ which are described in POSIX.1-2001. Yeah that’s right – these extended headers were standardized in 2001, have not found their way into RHEL6, but are available in newer tars on OS X and Linux now. The only problem is, they aren’t really backwards compatible.

The cpan on my CentOS machine expands the archive, finds all kind of headers it does not know what to do with, and creates separate directories called PaxHeaders.

It puts files in these directories with the extended headers; one is called Makefile.PL and cpan tries to execute ALL Makefile.PLs it finds, however the file containing the extended headers is not valid perl and that is the actual error you see above.

The directory structure created looks like below, you can inspect it by typing ‘look SOAP::Lite’ in your cpan client on RHEL6:

Makefile.PL  <-- this is actually created by cpan
PaxHeader
  |-- SOAP-Lite-1.13  <-- this contains extended headers
SOAP-Lite-1.13
  |-- bin
  |-- lib
  |-- Makefile.PL  <-- actual Makefile.PL
  | ...
  |-- PaxHeader  <-- all files below only contain extended headers
        |-- bin
        |-- lib
        |-- Makefile.PL # this is not valid perl!

How widespread is this issue?

Well of course if it’s ONLY SOAP::Lite that is affected by this, it would not be so bad. I wrote a little script and let it loose on my offline CPAN mirror, ran it on all dists from authors starting with ‘A’ and found hundreds of tarballs that had this issue. I did not bother running it on the complete CPAN but I guess you can say a pretty considerable portion of CPAN would have the ‘new’ tar archives.

What can you do as a CPAN author?

I asked the author of SOAP::Lite, Fred Moyer, if he did anything special or changed anything to his setup between the 1.12 and 1.13 release. He said he did not think anything particular should have changed. He just used his OS X laptop and build chain and out came the extended headers.

Tux wrote on perlmonks about this, he ran into the issue on his openSUSE laptop, and has a nice solution: add some tarflags to Makefile.PL like below

use ExtUtils::MakeMaker;
WriteMakefile (
    ABSTRACT     => "Comma-Separated Values manipulation routines",
    VERSION_FROM => "CSV_XS.pm",
    macro        => { TARFLAGS   => "--format=ustar -c -v -f",
                    },
    );

Workaround for cpan users

There are actually two possible workarounds for this:

 

Upgrading Archive::Tar

The cpan client extracts the tarballs using the module Archive::Tar. If you’d upgrade Archive::Tar using cpan, it will handle the archives properly and you can install SOAP::Lite properly. Of course this would only work as long as the author of Archive::Tar does not upload its module with extended headers.

Upgrading cpan

If you’d upgrade your cpan module, it will start using /usr/bin/tar instead of the perl module Archive::Tar for the extracting. /usr/bin/tar on RHEL 6 actually ignores the extended headers and will complain loudly about them, but it DOES work. Upgrading cpan also only will work as long as the new tarball itself is not created with extended headers!

Resolution upstream

Being a good citizen I thought it might be helpful to file this issue at Red Hat, it’s #1184194  – it was very swiftly handled by backporting a fix from Archive::Tar so it ignores PaxHeader files; that’s great! It’s in QA now for a little over three weeks. I’m not sure exactly about how this would progress, but hopefully it’ll be released and you can just yum upgrade and have this issue done with soon!

Installing DBD::mysql on Strawberry Perl is really easy!

DBD::mysql is the driver for the Perl DBI database interface. Basically, if you want to interact with your MySQL or MariaDB  server from Perl, this is the driver you’d use.

I co-maintain the driver and heard people complaining that it’s difficult to install it on Windows. This post basically explains that it’s not so bad.

It is already installed

First of all, you do know this right? It IS already installed with Strawberry Perl. Strawberry Perl is a very good distribution of Perl for Windows. Many of the nice modules you’d already want are available out of the box. See the list on the website for more details. There’s also ActiveState perl, more on that later.

Nevertheless, installing is easy

If you want to install it because you want a newer version or something like that, it’s pretty easy. Strawberry bundles the MySQL driver in [strawberry]/c/bin and ships with a batch file that acts as if it is mysql_config – this is very convenient.

The only problem is that the current version of DBD::mysql, 4.029, does not properly find it. You’d have to tell it where to look. This will be fixed in the next release of the driver. I already committed the fixes which was basically just deleting old code.

c:\> cpan
cpan> look DBD::mysql
> perl Makefile.PL --mysql_config=c:\strawberry\c\bin\mysql_config.bat
> dmake
> dmake install
> exit
cpan> quit

And that’s it!

Compiling against your own version of the libraries

Unfortunately, this is only really this easy because we compile against the version of the libraries bundled with Strawberry. Compiling against our own libraries, perhaps newer, or from MariaDB or so, is much more difficult at this point in time. Also, if you’d be using ActiveState Perl, you’d be out of luck. And of course, with ActiveState you can ppm install DBD-mysql but this only works if you’re on the current or previous-current perl because otherwise you’re behind a paywall and you’re out of luck.

I’ll be trying to make this easier (and promised it to Mithaldu) and will post more on that later.

As a bonus: RMSB8sZtJOIIAAL6fg

I was on Fosdem today where I met lots of nice people and had some interesting conversations. Here is a photo of RMS who was there handing out GNU slash Linux stickers.

PostgreSQL 9.4’s JSONB and perl DBD::Pg

540px-Postgresql_elephant.svgAs you’ll know by now, PostgreSQL 9.4 is out and it is better than ever. The most notable change is the new JSONB data type, where you can store ‘document type’ data in your relational database, basically allowing for supporting many use cases that previously required MongoDB or similar NoSQL solutions.

A couple of functions and operators have been added to support JSONB, one of these is the ? operator which can be used to check if a key exists. This caused a problem for Perl’s DBD::Pg module as it uses the ? character as a bind operator. Since version 3.5.0 and up, you can now escape a ? if you want to use it as an operator instead of a bind parameter. I’ll show briefly how you can leverage this.

Example: the most stupid schema ever

Because so many SQL examples are using books and authors, I’ll throw in the obligatory book table. But this time with JSONB!

Of course this is only an example, but one of the use cases of JSONB is that you can add as many attributes to an object as you want without the need of changing the schema and your database will still be fast. You can also create indexes on JSON attributes if needed.

So in my example we’ll create the most stupid database table layout:

  CREATE TEMPORARY TABLE book (
    title text not null,
    info jsonb
  );

So we’ll store the book title in a column and we’ll create a second column in the JSONB format to store attributes. For instance, let’s insert these values:

INSERT INTO book (title, info) VALUES
    ('Our Uncle',       '{"year": 1985, "category": "novel", "author": "Arnon Grunberg" }'),
    ('The Wild Things', '{"year": 1999, "author": "Dave Eggers"}');

Some SQL queries on the JSONB

This allows us to query on the book. For instance, this would return the author of ‘The Wild Things’:

SELECT info::jsonb->'author' FROM book WHERE title = 'The Wild Things';

But what if we want to get a list of all books that have a category set? This is where the ? operator comes into play. The ? operator can be used to check existence of a key. You can also use it to check if a value is part of a JSON array. But for our example, the query would be like this:

SELECT title FROM book WHERE info::jsonb ? 'category';

Apply that to perl!

Now let’s see what that would look like in perl.
You’ll need DBD::Pg 3.50 or newer to be able to use the \ character to be able to escape the operator. Remember, a regular ‘?’ is use as bind character when using Perl DBI. If you’d use an older version of DBD::Pg you’d still be able to leverage the JSONB features, just not the ? operator!

If you’d use it on a version of DBD::Pg that is prior to 3.5.0, you’d get:

DBD::Pg::db selectall_arrayref failed: called with 1 bind variables when 2 are needed at jsonb.pl line 32.

CPAN Pull Request Challenge for January: DateTime::Format::Epoch

About the challenge

About a week before Christmas I signed up for Neil Bowers 2015 CPAN Pull Request challenge. This is a challenge where each month you’d need to send a pull request to a module that would get assigned to you; and the idea is that you really need to make this one pull request each month. I thought it would be nice to see if I could pull it off.

Continue reading CPAN Pull Request Challenge for January: DateTime::Format::Epoch