MySQL Replication – Multi-Threaded Slaves (Parallel Event Execution)
January 7, 2013 Leave a comment
If you aren’t familiar with MySQL replication, “Replication enables data from one MySQL database server (the master) to be replicated to one or more MySQL database servers (the slaves). Replication is asynchronous by default – slaves need not to connected permanently to receive updates from the master. This means that updates can occur over long-distance connections and even over temporary or intermittent connections such as a dial-up service. Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a database.” (From: http://dev.mysql.com/doc/refman/5.5/en/replication.html).
I use MySQL replication on my home office server. I don’t really have much data to store, but it is nice to have several replicated slaves for backup purposes and also for testing new replication features of MySQL. I also use my setup to demo MySQL Enterprise Monitor.
Prior to MySQL 5.6.3, replication slaves were single-threaded. There are three threads involved with replication, but only one of those threads writes the replicated data from the master to the slave database (more info about these threads may be found here). If you had a busy master server, with a high number of writes, these writes could get bottlenecked at the slave, as the slave could only apply each event one at a time, in the same order that the events were executed on the master. If you were using the slaves for your reads, then your data could be stale depending on how quickly (or slowly) the slaves could apply the writes from the master.
I was giving a presentation a couple years ago about MySQL replication. I was trying to demonstrate that adding slaves doesn’t necessarily divide the workload equally among the number of servers that you have. In my example, I stated that we had a master server that had a max load (at 100%) of 10,000 events (reads and writes) per a given time frame. For these 10,000 events, we assumed that the master was performing 6,000 reads and 4,000 writes during this period. If we added three slaves, we couldn’t figure that the master would now be at 25% capacity (100% capacity divided by the new total of four servers). Each slave also has to perform the same number of writes as the master – we were really only scaling the number of reads across four servers.
In order to try and get my point across, I then presented a formula that looked like this:
Master server alone: Max. Load = 6000 reads + 4000 writes / 10,000 Master Server plus three slaves: Load = (6000 reads + (4 x 4000 writes) / (4 servers x 10,000)) = (22,000 / 40,000) = 55%
Someone in the audience then pointed out that this formula wouldn’t work exactly, as the slaves were single-threaded. Of course that was a true statement, but the point that I was trying to make is that scaling out with slaves isn’t exactly linear. Later, I tried to come up with an easy way to determine a formula that would take into account the fact that slaves were single-threaded, but I couldn’t find an easy way to do this. There were just too many factors involved. For future presentations, I just made sure that I added the caveat that slaves are single-threaded.
So, what are multi-threaded slaves? Multi-threaded slaves allow you to execute the replication events from a master across different databases in parallel. For best results, you should partition your data per database. In other words, instead of having one database with many tables, you would have a database for each table. If you have some tables that are read-only and you rarely write to these tables, you could include them in their own database or in another database. By splitting the data into multiple databases, MySQL replication is able to update each database separately, in the same order relative to the updates as they occurred on the master. There is a system variable named slave_parallel_workers, which should be set to equal the same number of databases that you have. There are some concerns in using multi-threaded replication, as events from the master might not be executed on the slave in the same order. Click here for more information on MySQL multi-threaded replication (Parallel Event Execution).
Back in April, 2012, Mat Keep wrote about Benchmarking MySQL Replication with Multi-Threaded Slaves in which he demonstrated that multi-threaded slaves could improve slave performance by almost five times. Mat’s example involved inserting 10,000 rows into 10 different schemas on a single slave. Mat stated that there are three key variables that you need to set to achieve maximum performance with multi-threaded slaves:
binlog-format=STATEMENT relay-log-info-repository=TABLE master-info-repository=TABLE
(From http://dev.mysql.com/doc/refman/5.6/en/slave-logs-status.html: use ––master-info-repository to have the master info log written to the mysql.slave_master_info table, and use ––relay-log-info-repository to have the relay log info log written to the mysql.slave_relay_log_info table)
Mat also suggests that the variable slave_parallel_workers should be set to equal the number of schemas that you have. If you look at the data he provided, increasing this variable beyond the total number of schemas did not improve performance. Even though the slave is multi-threaded, each schema is still single-threaded, so having multiple worker threads didn’t really make a difference.
While I don’t have the need at this point to switch my replication setup to use multi-threaded slaves, it is still a great new feature of 5.6. You may download a copy of MySQL at http://dev.mysql.com/downloads/mysql. As of this writing, 5.6 is under the Development Releases tab.
|Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.|