DBD::mysql – even more utf8-issues fixed, super CI

In my previous post I explained how many utf-8 related issues are now fixed in the latest DBD::mysql development release, and I asked for feedback and testing from our users.

Even better utf-8 handling

I’m happy to inform you that, thanks to your feedback, we’ve now released yet another development version, with even more UTF8 issues fixed. Tanabe Yoshinori reported that column names and database warnings were not properly encoded and prolific contributor Pali Rohár fixed that issue, and much more.

Serialization issue under taint mode

One other issue which was reported by amavisd users has to do with how perl stores values internally under taint mode, which is the default way amavisd is run. This is now fixed, see for more information https://github.com/perl5-dbi/DBD-mysql/issues/78.

Extreme CI testing setup

And Pali also modified our Travis setup so we now do continuous integration testing not only on many different perl versions, but also using many different MySQL and MariaDB versions. This uncovered many smaller and larger issues and we can compile again all the way back with MySQL against version 4 if you would want that (and we had multiple people wanting that and filing bug reports for this in the past).

This also lead to Pali discovering a use-after-free security issue (CVE-2017-3302) in libmysqlclient which was fixed in MySQL 5.6 and up but still present in 5.5 and also in MariaDB. See for more info this thread on oss-security.

Your feedback is welcome!

Find the full change log below; when all is well on Wednesday 8th of March we’ll release the stable version 4.042, including all these changes, as well as the changes from the previous post.

You can leave your feedback via the DBI-users mailing list, or using our GitHub page.

2017-02-28 change log of version 4.041_2)

DBD::mysql – all your UTF-8 bugs are belong to us!!□□

After a couple of years of more or less “maintenance mode” on DBD::mysql – we had a hand full of people contributing occasional fixes and a whole slew of drive-by contributors – we now have a prolific contributor again: Pali Rohár.

It’s great to see some more long-standing issues taken care of!

This time around, in the new development release 4.041_01 that is on CPAN now (https://metacpan.org/release/MICHIELB/DBD-mysql-4.041_01), there are some important fixes for some Unicode-related issues that I would like to point out. The sections below I have distilled based on the descriptions made by Pali.

Automatically converting to UTF-8 for bind parameters

Before this release perl scalars (statements or bind parameters) without UTF8 status flag were not encoded to UTF-8 even if mysql_enable_utf8 was enabled. This caused perl scalars with internal Latin1 encoding to be sent to the mysql server as Latin1 even if mysql_enable_utf8 was enabled.

Now all statements and bind parameters which are not a DBI binary type (SQL_BIT, SQL_BLOB, SQL_BINARY, SQL_VARBINARY or SQL_LONGVARBINARY) are automatically encoded to UTF-8 when mysql_enable_utf8 is enabled.

If mysql_enable_utf8 is not enabled and your statement or bind parameter contains a wide Unicode character then DBD::mysql shows a warning. If a binary parameter contains a wide Unicode character then DBD::mysql shows a warning too, similar like function print without using a :utf8 perlio layer. (“Wide character in…”)

Perl’s SvPV() returns char* from a perl scalar and the following SvUTF8() call for that scalar returns true if SvPV returned the data in UTF-8 or Latin1.

Decoding of UTF-8 fields when mysql_enable_utf8 is enabled

For each fetched field mysql server tells us its charset id. Before this release when mysql_enable_utf8 was enabled DBD::mysql UTF-8 decoded all fields with a charset id different than 63 (which means binary).

Now DBD::mysql UTF-8 decodes only those fields which have their charset set to utf8 or utf8mb4. By default mysql server sends data in encoding specified by SET NAMES command, which is by default Latin1. So any received Latin1 data is not UTF-8 decoded anymore.

The mysql server sends a charset id, not a charset name. Each combination of charset name and collation pairs has its own charset id. A new function charsetnr_is_utf8() has hardcoded all utf8 and utf8mb4 charset ids from mysql (up to 8.0.0) and mariadb (up to 10.2.2) from their source code. So far it looks like those ids are not changing since old mysql 5.0, only new ones are added.

Conclusion

We hope these changes make DBD::mysql a lot more consistent for you. Since the changes are rather big, we’d urge you to test the development release 4.041_01 which is on CPAN and give feedback NOW; this allows us to make changes if needed before we create an actual stable release with these features.

And of course, if you test it with your software and all is good, we’d like to hear that as well!

You can leave your feedback via the DBI-users mailing list, or using our GitHub page.