You are reading 'Varchar vs. char'. You can leave a comment or trackback to this post.
Newer»« Older| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Apr | Dec » | |||||
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
Posted on May 31st, 2009 at 3:55pm by Pi.
Categories: Development.
This week I’ve been on holidays, and although many people disagree with how I’ve spent this free time, I must admit that I’ve had a quite entertainment week programming again. I’m moving my old movies list entirely to MySQL/PHP, with lots of new stuff to allow me to better describe what I have movie-wise. Although small, this project has made me investigate some of the secrets of MySQL, including the long-time debate of using varchar versus char, or even versus tinytext/text fields.
First of all, let me explain a bit about how do I use MySQL. I totally dislike SQL; although a powerful language, I can’t get into it. I find it cumbersome, to be polite. However, I find that MySQL is a great tool, and it’s my choice for storage backend, knowing that it can scalate to any project size or complexity if I need it in the future. But for now I use just the basic stuff: select where, insert, and a few functions. I don’t use indexes, as my projects range from very small to absolutely tiny; I don’t use joins, as I’m more confortable processing stuff from the programming language I’m using MySQL from (in this case PHP). I don’t use transactions as I’ve not ever needed them. I don’t have views nor foreign keys (in the SQL sense), I only use MyISAM, and basically I just scratch the surface of SQL.
So, while making the scripts to import the old data into a new, shiny, cross-referenced multi-table schema, I wanted to optimize the thing a bit more, because my 1,200 movies were taking 47 seconds to get into the database. Since I’ve took my time to make a quite clean code, and rather optimize it, I looked into the schema structure itself, trying to use the best possible field types for each value. Should I use char(x) or varchar(x)? Is there any advantage between varchar() and tinytext? Looking into Google, I found a large variety of answers in the char vs. varchar debate, most of them insufficient, and others too technical.
There are many variables involving in the selection of the right field type. Most of the technical answers didn’t give a definite answers, because, for example, tinytext/text can’t be indexed; performance benchmarks between char() and varchar() were based in indexed fields; diferences in storage were compared with databases of millions of rows, etc. After trying to use the information I gathered from these pages, the time to import the 1,200 movies went from 47 seconds to 105. This made me discover that I should actually run my own benchmarks adapted to the way I use MySQL: as a simple storage layer ^_^
Thus, what are my needs? I don’t need to save the last byte of space, I’m using tables with thousands of rows, not multi-million row huge data warehousing. Basically I need straight-forward queries, with possibly the best speed for better performance and minimal server load, as I intend to deploy my projects in my host. If this is the way you use MySQL, my benchmark might be of help to you instead of long technical descriptions of minute differences between types of data. So, 47 seconds to insert 1,200 movies didn’t seem very quick, so here I go, let’s benchmark.
Basically, my benchmark generates a big array of random strings. Then inserts them in a table with only one field of a certain type. After that, I read them one by one, sequentially. Finally, I shuffle the array, and search each item into the database. Since this last step is about two magnitudes slower than writing or reading, I only do it with the first 10% of the array. Note that the generated array and the shuffled values are the same for each benchmark, so it stays consistent.
The fields are char(250), varchar(250), tinytext and text. No other fields in each table, so they don’t stain the benchmark. Not indexed, no fancy stuff. Now, this is the result of the benchmark using 35,000 strings between 10 and 250 characters:
field_char_w(35000): 1.7499628067
field_varchar_w(35000): 1.65307593346
field_tinytext_w(35000): 1.68957400322
field_text_w(35000): 1.65133500099
field_char_r(35000): 0.87783408165
field_varchar_r(35000): 0.864487886429
field_tinytext_r(35000): 0.865431070328
field_text_r(35000): 0.877004861832
field_char_s(3500): 149.076941967
field_varchar_s(3500): 37.0553221703
field_tinytext_s(3500): 33.6053481102
field_text_s(3500): 33.6214549541
Analysing this, it doesn’t show much difference in writing or direct reading. In all the tests I’ve run, I can tell you that char() is consistently slower than the other fields, although just marginally. So, where does lie the difference between these fields? In searching. Note that these strings aren’t indexed; if they were, the index would be performing the same regardless of char() or varchar(). The problem lies in searching in a certain field type, and as you can see, char() is 4 times slower than varchar() under the exact same circumstances: the same strings in the same order in the table, doing the same searches between them. The difference is slightly less if the strings are always close to the maximum size of the field, but it’s still 4-fold.
So what about all those discussions saying that a fixed size row is faster than a dynamic one? Books saying to use char over varchar whenever you can? People mentioning the overhead of having to look for the exact length of a varchar field? I don’t know, probably they’re right in some contexts; in mine, varchar wins: less space, much faster.
Additionally, I also thought that tinytext/text would be slower than varchar, but benchmarks show something different. They are faster! Not by much, but enough to do the transition between varchar(255) and tinytext. So, between varchar and tinytext, I would choose tinytext. Actually I should do it now, change my table structure to only use tinytext/text, and see what happens.
(A few minutes later…)
There, no varchar() nor long char() fields. 46 seconds. Impressive. One night and one morning of researching to save 1 second, I should congratulate myself ^_^
Note that I was comparing char(255) vs. varchar(255), and those against tinytext/text. For smaller strings, like char(10) vs. varchar(10), char is slightly faster. Using 50,000 strings of 1 to 10 chars, the results are different. In writing, not so different, as char is still marginally slower, with varchar being marginally faster. In reading, varchar is marginally slower, with char and tinytext/text being roughly the same. In searching, again surprises (somewhat expected), with char being the fastest with 31.2 seconds versus the 36.1 seconds of varchar.
So, my conclusion is that if you plan to use long (255 chars) strings, use varchar() or tinytext. If you plan to use short strings (16 chars) then use char(). Curiously enough, that was what I was using before all this started (with a few exceptions polished out which gave me that extra second). On the other side, if space is something to consider, if you do joins, views, indexes, if you go deeper in the SQL realm, then do your own benchmarks.
That is my conclusion. The conclusion of this article, is that theory is one thing, and real-world results in specific situations are something different. If you aren’t very good in things like database design and SQL, don’t rely in technical specs and expert theories, and think/benchmark by yourself.
no comments yet.
Comments can contain some xhtml. Names and emails are required (emails aren't displayed), url's are optional.
Pi in the Sky is powered by WordPress. Dressed with Vistered Little. Hosted at MochaHost.