Demystifying MySQL’s UTF8MB4: A Guide to Character Encoding in Databases with WordPress in GCP and Cloud SQL

UTF8MB4

Introduced in MySQL version 5.5.3, is an extension of the UTF-8 character encoding scheme. While UTF-8 can encode 1.1 million characters, UTF8MB4 can encode the full range of Unicode characters, including emojis and characters outside the Basic Multilingual Plane (BMP).

  • utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.
  • utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character.

In MySQL utf8 is currently an alias for utf8mb3 which is deprecated and will be removed in a future MySQL release. At that point utf8 will become a reference to utf8mb4.

So regardless of this alias, you can consciously set yourself an utf8mb4 encoding.

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL’s encoding called “utf8” (alias of “utf8mb3”) only stores a maximum of three bytes per code point.

The utf8mb4 character set is useful because nowadays we need support for storing not only language characters but also symbols, newly introduced emojis, and so on.

Cloud SQL – GCP

Google Cloud Platform [GCP] by default enables and uses utf8 when creating a Database and that is aliased to utf8mb3 which is okay for most cases.

I was trying this out when using wordpress and it was not enough of it for me.

So i was searching for a best option and i found out that utf8mb4 was the go to solution for it.

The very famous and professional WordPress VIP also uses the same thing. So it is really a good practise and important one to do when it comes to db and wordpress.

Comments

Leave a Reply