Brian Cryer

 

Cry MySQL How to...

Change the charset and collation order for a MySQL database


The default charset for MySQL is "Latin1", which is mostly sufficient for most Western European languages (and the default collation sequence is "latin1_swedish_ci").

I encountered an issue where a live MySQL database was using the default charset and collation, but problems were being encountered because it should have been created using the "utf8" charset and "utf8_general_ci" collation. For this reason these notes show how to set or change the charset to "utf8" and the collation to "utf8_general_ci" - if this is now what you want then you should be able to simply substitute instead your desired charset and collation order.

These notes cover how to change the charset and collation order for a single MySQL database. Unfortunately whilst changing the charset and collation order for a single table is straight forward, its a little more difficult to change it globally for an entire database but a script is provided for this.

These notes cover:


How to see the default charset and collation order MySQL will use for a newly created table

To see the default charset that MySQL will use for newly created tables use:

show variables like "character_set_database";

To see the default collation sequence that MySQL will use for newly created tables use:

show variables like "collation_database";


How to set the default charset and collation order for MySQL

To explicity set the default charset and collation order when first creating a database use:

create database db_name character set utf8 collate utf8_general_ci;

For an existing database the default charset and collation order is changed using:

alter database db_name character set utf8 collate utf8_general_ci;

Be aware that changing the default charset and collation order only affects tables that are subsequently created. It does not affect any of the existing tables.


How to see what charset and collation order a given table is using

To see what charset and collation a given table is using, use:

select character_set_name, collation_name
from information_schema.tables T
inner join information_schema.collation_character_set_applicability C on (C.collation_name=T.table_collation)
where table_schema=Database() and table_name='table_name'

being sure to substitute the name of the table you are interested in for "table_name" in the above.

If you are interested in a table which is in a database other than the current database then change: table_schema=Database() to table_schema='db-name' in the above, substituting the name of the database for 'db-name'.

Whilst its not used very often, be aware that each field can have its own charset and collation order. To view the charset and collation order of a table field use:

select character_set_name, collation_name from information_schema.`COLUMNS` C
where table_schema = Database()
and table_name = 'table_name'
and column_name = 'field_name'

Be sure to substitute the name of the table and field in the above. The above assumes the current database, so replace the call to Database() with the name of the database if the table is in a different database.


How to set the charset and collation order for a single table

To change the character set and collation order for a single table use:

alter table db_name.table_name convert to character set utf8 collate utf8_general_ci

Needless to say, whilst this is fine for a table or two its not very practical converting a table at a time if you have a large number of tables! 


How to set the charset and collation order for all tables

The following SQL creates a procedure to reset the charset and collation status for all tables that need it. The last line of the provided SQL calls the procedure.

-- Copyright (c) 2011 www.cryer.co.uk
-- Script is free to use provided this copyright header is included.
drop procedure if exists ResetCharsetAndCollation;
delimiter //

create procedure ResetCharsetAndCollation(
IN dbName tinytext,
IN chrSet tinytext,
IN colOdr tinytext)
begin
  -- Cycle through all the tables which don't have the
  -- specified charset or collation sequence.
  DECLARE done int default 0;
  DECLARE tableName tinytext;
  DECLARE table_cursor CURSOR FOR
    select table_name
    from information_schema.tables T
    inner join information_schema.collation_character_set_applicability C
    on (C.collation_name=T.table_collation)
    where table_schema=dbName
    and (character_set_name<>chrSet or collation_name<>colOdr);
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=1;
  OPEN table_cursor;
  FETCH table_cursor INTO tableName;
  while not done do
    -- Reset the charset and collation sequence for this table.
    set @ddl=CONCAT('alter table ',dbName,'.',tableName,
      ' convert to character set ',chrSet,
      ' collate ',colOdr);
    prepare stmt from @ddl;
    execute stmt;
    deallocate prepare stmt;
    FETCH table_cursor INTO tableName;
  end while;
  CLOSE table_cursor;
end;
//

delimiter ;

call ResetCharsetAndCollation(Database(),'utf8','utf8_general_ci');

Change the last line to specify a different charset or collation order.

Be aware that this may take a while to run, depending on the number (and size) of tables to be updated.

This SQL does not change the default charset and collation order for the database, so you might want to do that as a separate step.


These notes have been tested against MySQL version 5.1.



About the author: is a dedicated software developer and webmaster. For his day job he develops websites and desktop applications as well as providing IT services. He moonlights as a technical author and consultant.