Logo
Efficient MySQL - Selecting the Right Data Types

DB 2025-05-15

Efficient MySQL - Selecting the Right Data Types

Choosing the right data types in MySQL is a fundamental step towards building efficient, high-performing databases. While it might seem like a minor detail, selecting optimal data types directly impacts storage requirements, query speed, and overall system responsiveness. This article will guide you through the process of selecting the most appropriate data types for your MySQL tables, helping you unlock better performance and resource utilization.

While PostgreSQL is great, MySQL is often the preferred choice for applications anticipating significant growth, primarily due to solutions like Vitess. Vitess offers a robust database clustering system for MySQL, enabling horizontal scaling that can be more straightforward to implement than scaling PostgreSQL for massive workloads. This future-proofing against the complexities of expanding PostgreSQL, which can involve intricate sharding strategies and performance tuning for large-scale deployments, makes MySQL with Vitess a go-to default for many.

Beyond its scaling advantages with tools like Vitess, MySQL’s inherent simplicity is frequently cited as one of its key strengths. This simplicity can translate to easier setup, management, and a gentler learning curve, particularly for less complex applications or those with read-heavy workloads. While PostgreSQL boasts a rich feature set and advanced capabilities, MySQL’s streamlined nature is often favored for projects where rapid development and ease of use are paramount.

The trending SQLite db per entity/user/tenant can be straightforward for simpler, more isolated datasets. However, this model can lack maturity when dealing with applications requiring complex relationships and interactions between these entity/user/tenants. Managing potentially thousands or millions of individual database files introduces significant overhead in terms of backups, migrations, and ensuring data consistency across what are essentially siloed datasets. For applications where entities frequently interact or share common data, this separation can hinder performance and complicate development, as querying across these distinct databases becomes cumbersome and inefficient compared to operating within a single, unified database structure.

For these reasons I generally turn to MySQL and so in this article I will discuss how to choose your column data types wisely. To optimize MySQL performance when choosing data types, remember these key guidelines:

The process involves first determining the general class of type (numeric, string, temporal) and then selecting the specific type, considering its range, precision, storage requirements, and any special behaviors (e.g., DATETIME vs. TIMESTAMP). MySQL also uses aliases for some data types (like INTEGER for INT), which are for compatibility and don’t impact performance; MySQL will report the base type.

Whole Numbers

Whole numbers in MySQL are stored using integer data types: TINYINT (8 bits), SMALLINT (16 bits), MEDIUMINT (24 bits), INT (32 bits), and BIGINT (64 bits). Each type has a specific range of values it can store, determined by its bit size.

The UNSIGNED attribute can be applied to these types, which prevents negative values and roughly doubles the positive value limit (e.g., a TINYINT UNSIGNED stores 0 to 255, while a signed TINYINT stores -128 to 127). Signed and unsigned types use the same storage space and have similar performance.

Although data is stored according to the chosen integer type, most integer computations in MySQL are performed using 64-bit BIGINT integers, with some exceptions like certain aggregate functions that might use DECIMAL or DOUBLE.

Specifying a “width” for an integer type, like INT(11), does not affect the range of values that can be stored or how computations are done. It only influences the display width in MySQL’s interactive tools. For storage and calculations, INT(1) is the same as INT(20).

Real Numbers

Real numbers in MySQL represent numbers with fractional parts and can also store integers too large for BIGINT. MySQL offers exact (DECIMAL) and inexact (FLOAT, DOUBLE) types.

FLOAT and DOUBLE use standard floating-point math for approximate calculations. It’s generally advised to specify the type (FLOAT or DOUBLE) without defining precision, as precision specifiers are nonstandard and can cause MySQL to alter the data type or round values. FLOAT uses 4 bytes of storage, while DOUBLE uses 8 bytes, offering greater precision and range. Internally, MySQL uses DOUBLE for floating-point calculations.

DECIMAL is recommended for exact fractional results, such as financial data, despite its higher storage and computational costs. An alternative for high-volume financial data is to use BIGINT by multiplying currency amounts to store them as integers (e.g., multiplying dollar amounts by a million to store values to the ten-thousandth of a cent), thus avoiding floating-point imprecision and DECIMAL’s cost.

Strings

MySQL offers a variety of string data types, with VARCHAR and CHAR being the two primary ones for storing character values. Each string column can have its own character set and collation (sorting rules). The way these types are stored can vary by storage engine, with InnoDB being a common example.

VARCHAR

VARCHAR stores variable-length character strings and is the most frequently used string type. It’s efficient in terms of storage because it only uses the space needed for the actual string, plus 1 or 2 extra bytes to record the string’s length (1 byte if the maximum length is ≤255 bytes, 2 bytes if more). For instance, a VARCHAR(10) using the latin1 character set will use up to 11 bytes.

This space saving can improve performance. However, since rows are variable-length, updates can cause them to grow, potentially leading to extra work like page splitting in InnoDB if the row no longer fits in its original location. VARCHAR is generally best when:

CHAR

CHAR stores fixed-length character strings. MySQL allocates space for the specified number of characters for every CHAR value. When storing, trailing spaces are removed from the value. For comparisons, values are padded with spaces as needed. Upon retrieval, these padding spaces are typically stripped.

CHAR is useful for:

BINARY and VARBINARY

The sibling types for CHAR and VARCHAR are BINARY and VARBINARY, respectively. These store binary strings (byte strings) rather than character strings. MySQL pads BINARY values with \0 (the zero byte) instead of spaces, and this padding is not stripped on retrieval. These types are useful for storing binary data where byte-by-byte comparison is desired, which can be faster and simpler than character comparisons as it’s based on numeric byte values.

Efficient Space Allocation

It’s important to allocate only as much space as genuinely needed for string columns. Even if storing ‘hello’ takes the same disk space in a VARCHAR(5) and a VARCHAR(200), the larger column can consume significantly more memory. MySQL often allocates fixed-size memory chunks for internal operations like sorting or using temporary tables (in-memory or on-disk). Using excessively large column definitions can lead to inefficient memory use, negatively impacting performance.

BLOB and TEXT types

BLOB and TEXT are string data types in MySQL used for storing large amounts of binary data (BLOB) and character data (TEXT), respectively. Each is a family of types (e.g., TINYTEXT, TEXT, LONGTEXT and TINYBLOB, BLOB, LONGBLOB), with BLOB being a synonym for SMALLBLOB, and TEXT for SMALLTEXT.

MySQL treats each BLOB or TEXT value as an individual object, often storing large values externally to the row, requiring 1 to 4 bytes in the row itself plus the external storage for the data. The primary distinction between them is that BLOB types store binary data without a character set or collation, while TEXT types have an associated character set and collation.

Sorting BLOB and TEXT columns is unique: MySQL only sorts based on the first max_sort_length bytes, not the entire string. Furthermore, MySQL cannot index the full length of these data types and cannot use these indexes for sorting operations. It’s generally advised not to store large objects like images directly in the database as BLOBs due to performance issues with large data volumes, such as slower schema changes. Instead, it’s better to store them in a separate object data store and record their location or filename in the database.

Using ENUM as a string alternative

Using an ENUM column in MySQL instead of a traditional string type allows you to store a predefined set of distinct string values. MySQL stores these values very compactly, as integers (1 or 2 bytes depending on the number of options), which represent the position of the string in the defined list.

Key Points:

In summary, ENUMs offer space efficiency and can improve performance in specific join scenarios (ENUM to ENUM). However, they introduce complexities with sorting, potential performance hits when joining with string types, and operational challenges due to the need for schema changes when modifying the list of allowed values.

Date and Time

MySQL offers various data types for dates and times, with microsecond precision being the finest. While types like YEAR and DATE are straightforward, storing combined date and time presents a choice between DATETIME and TIMESTAMP.

DATETIME

TIMESTAMP

Storing Date and Time as an Integer

An increasingly popular alternative is storing the date and time as a Unix epoch (seconds since January 1, 1970, UTC) in an integer column.

Key Considerations for Choosing:

Ultimately, the best choice depends on specific application needs, and storing time as an integer is a viable alternative to consider.

Bit-Packed

MySQL offers several ways to store data compactly using individual bits, though these are technically string types.

BIT

Used to store one or more true/false values. BIT(N) stores N bits (up to 64). However, InnoDB stores BIT columns as the smallest integer type large enough, so space savings aren’t guaranteed. MySQL treats BIT as a string type. Retrieving a BIT(1) yields a binary string “0” or “1”. In a numeric context, it converts to the corresponding number. For example, b'00111001' (binary for 57) stored in a BIT(8) column retrieves as the character “9” (ASCII 57) in a string context, but as the number 57 in a numeric context (e.g., a + 0). This can be confusing, and caution is advised; avoiding this type is often better.

CHAR(0)

An alternative for a single true/false value is a nullable CHAR(0) column, storing NULL or an empty string. This works but can be unclear for users and complicate queries. TINYINT is generally recommended instead unless space is extremely critical.

SET

Suitable for storing many true/false values by combining them into one column. MySQL internally represents this as a packed set of bits, making it storage-efficient. Functions like FIND_IN_SET() simplify its use in queries.

In essence, the seemingly small decision of picking a TINYINT over an INT, or a VARCHAR with an appropriate length over a TEXT type, accumulates across your database, significantly influencing its agility and cost-effectiveness. By investing the time to understand and implement optimal data types, you’re laying the groundwork for a high-performing, scalable, and reliable MySQL database that can effectively support your application’s needs now and in the future.