Bugzilla: Upgrading MySQL Database from Latin1 to UTF8

IMPORTANT Update 2009-07-09: This information is already several years old! You should not use this information for modern versions of Bugzilla (3.2 and above), which will allow you to convert to UTF-8 using checksetup.pl.

In the present case, we have a Bugzilla database created with a charset of latin1. Unfortunately, now after updating MySQL to 4.1, error occured when trying to assign a new developer to a bug, indicating mismatch of collations (UTF-8 vs. Latin1). This is caused by the fact that there is now latin1-data stored in an UTF-8 table.

The following procedure can be used to upgrade the database to UTF-8, eliminating the problem:

  1. mysqldump -p --default_character-set=latin1 --skip-set-charset bugs > dump.sql
  2. mysql -p --execute="DROP DATABASE bugs; CREATE DATABASE bugs CHARACTER SET utf8 COLLATE utf8_general_ci;"
  3. mysql -p --default-character-set=utf8 bugs < dump.sql
  4. perl -pe 's/latin1_bin/utf8_general_ci/g; s/latin1/utf8/g' dump.sql > dump-utf8.sql
  5. mysql -p --default-character-set=utf8 bugs < dump-utf8.sql

You should of course always check if the Pearl-RegEx only replaced charset declarations and not some matches within the data.

Thanks to TextSnippets for the script.

2 thoughts on “Bugzilla: Upgrading MySQL Database from Latin1 to UTF8”

  1. I’d just like to note for anybody who comes across this blog post–do NOT do this for modern versions of Bugzilla (3.2 and later). checksetup.pl itself will convert your database correctly to UTF-8 if you have the utf8 parameter turned on, and you should follow its instructions, not these.

    -Max

  2. Thanks for the information. As you can see, the information is 3 years old, I expected changes to occur and I hope, nobody is upgrading from 4.1 today anymore…

    I will add your warning to the article.

    Best regards,
    Martin

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.