Mysql which utf 8 collation to use




















If most or all applications use the same character set, specifying character settings at server startup or configuration time may be most convenient. For the per-database or server-startup techniques, the settings control the character set for data storage.

Applications that use the database should also configure their connection to the server each time they connect. The statement can be used regardless of connection method the mysql client, PHP scripts, and so forth.

In some cases, it may be possible to configure the connection to use the desired character set some other way. For more information about configuring client connections, see Section In a stored routine, variables with character data types use the database defaults if the character set or collation are not specified explicitly.

See Section To select a character set and collation at server startup, use the --character-set-server and --collation-server options. For example, to specify the options in an option file, include these lines:. These settings apply server-wide and apply as the defaults for databases created by any application, and for tables created in those databases.

Specify character settings at MySQL configuration time. It is unnecessary to use --character-set-server and --collation-server to specify those defaults at server startup. Regardless of how you configure the MySQL character set for application use, you must also consider the environment within which those applications execute.

For example, if you send statements using UTF-8 text taken from a file that you create in an editor, you should edit the file with the locale of your environment set to UTF-8 so that the file encoding is correct and so that the operating system handles it correctly. If you use the mysql client from within a terminal window, the window must be configured to use UTF-8 or characters may not display properly.

For a script that executes in a Web environment, the script must handle character encoding properly for its interaction with the MySQL server, and it must generate pages that correctly indicate the encoding so that browsers know how to display the content of the pages.

The following scenario has been tested on MySQL 5. You can use have , 4 byte characters or , 1 byte characters. Collations affect how data is sorted and how strings are compared to each other. That means you should use the collation that most of your users expect.

Example from the documentation for charset unicode :. So - it depends on your expected user base and on how much you need correct sorting. In my opinion, as far as the database should be concerned, a string is still just a string. A string is a number of UTF-8 characters.

A character has a binary representation so why does it need to know the language you're using? Usually, people will be constructing databases for systems with the scope for multilingual sites. This is the whole point of using UTF-8 as a character set. I'm a bit of a pureist but I think the bug risks heavily outweigh the slight advantage you may get on indexing. Any language related rules should be done at a much higher level than the DBMS.

For example, all client connections not only have a default charset makes sense to me but also a default collation i. I no longer recommend the "utf8" character set on MySQL, and instead recommend the "utf8mb4" character set.

They match almost entirely, but allow for a little lot more unicode characters. Realistically, MySQL should have updated the "utf8" character set and respective collations to match the "utf8" specification, but instead, a separate character set and respective collations as to not impact storage designation for those already using their incomplete "utf8" character set.

I found these collation charts helpful. It shows which characters it interprets as the same. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow.

Learn more. Asked 12 years, 11 months ago. Active 3 months ago. Viewed k times. Improve this question. Darryl Hein Darryl Hein k 88 88 gold badges silver badges bronze badges. With utf8, a field will be truncated on insert starting with the first unsupported Unicode character. I wonder if we'll ever need 5 bytes for all those emojis Related question: stackoverflow. For an overview of the sane options: monolune. Add a comment. Active Oldest Votes. Improve this answer. Overflowh 1, 6 6 gold badges 18 18 silver badges 40 40 bronze badges.

Unless you know you'll never, ever have a requirement to do it, I'd simply go for UTF-8 now and have one less problem to worry about - assuming your language of preference is fully compliant, of course. But on a computer system, many programmers especially the ones who were writing the code 10 years ago simply accept the system default encoding, and there are as many of those incompatible from one another as stars in the sky.

Show 2 more comments. Active Oldest Votes. Improve this answer. The article I linked to in my question also states that UTF-8 is the new Internet standard, and the article is from Add a comment.

Community Bot 1. Sparky Sparky 29 1 1 bronze badge. The Overflow Blog. Does ES6 make JavaScript frameworks obsolete? Podcast Do polyglots have an edge when it comes to mastering programming Featured on Meta. Now live: A fully responsive profile. Related



0コメント

  • 1000 / 1000