When you export from a MySQL Database to a .sql file, then import that into a new Database, you may find a bunch of weird characters. This post is not an exhaustive explanation of why the problem happens, more a note on how to avoid the issue when exporting and importing in MySQL.
The issue is essentially tied up with a mismatch between default collation in MySQL and the collation / character sets specified in the .sql file. To fix the issue, make sure they all match. If you’re using MySQL 4+, the default collation is probably UTF-8, so just make sure that the file has a default colltion set too. Putting the following line at the top of the .sql seems to do the trick:
/*!40101 SET NAMES utf8 */;
The reason seems to be with the importer: even though there is a default collation set on the database and on the tables, if there’s no default for the whole file each insert gets misinterpreted as a UTF-8 table that must be encapsulated in as latin1. Also check that the table creation statements in the .sql file are UTF-8 too, like this:
CREATE TABLE `wp_comments` (
...
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
For more detailed information, and info on how to fix existing databases, check out http://www.orthogonalthought.com/blog/index.php/2007/05/mysql-database-migration-and-special-characters/. There is a plugin for Wordpress that will convert the database, but this didn’t work for me.
How to prevent weird characters after MySQL export and import
When you export from a MySQL Database to a .sql file, then import that into a new Database, you may find a bunch of weird characters. This post is not an exhaustive explanation of why the problem happens, more a note on how to avoid the issue when exporting and importing in MySQL.
The issue is essentially tied up with a mismatch between default collation in MySQL and the collation / character sets specified in the .sql file. To fix the issue, make sure they all match. If you’re using MySQL 4+, the default collation is probably UTF-8, so just make sure that the file has a default colltion set too. Putting the following line at the top of the .sql seems to do the trick:
The reason seems to be with the importer: even though there is a default collation set on the database and on the tables, if there’s no default for the whole file each insert gets misinterpreted as a UTF-8 table that must be encapsulated in as latin1. Also check that the table creation statements in the .sql file are UTF-8 too, like this:
...
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
For more detailed information, and info on how to fix existing databases, check out http://www.orthogonalthought.com/blog/index.php/2007/05/mysql-database-migration-and-special-characters/. There is a plugin for Wordpress that will convert the database, but this didn’t work for me.