MySQL versus MongoDB for storing JSON documents in a doc-store database – which is faster?

MySQL added the ability to store JSON documents in a doc-store format with MySQL version 5.7 (general availability date – October 21, 2015). Although MySQL’s initial release of their doc-store feature did have some limitations, MySQL has made some really nice improvements in their version 8.0 release, and more recently, in their 8.0.19 release.

I came across a paper written by Erik Andersson and Zacharias Berggren titled “A Comparison Between MongoDB and MySQL Document Store Considering Performance“. I think the paper was written for a college project (and it might have been their senior thesis as they were in the bachelor’s program in computer science). In the paper, they compared MongoDB 3.4.4 against MySQL version 5.7.12, which was an early release of MySQL version 5.7. And, it wasn’t a surprise to see that MongoDB’s document store was much faster than MySQL.

But, with all of the new and improved changes to MySQL’s document-store in the latest release (version 8.0.19), I attempted to re-create their tests to see if MySQL had improved since version 5.7.12. And, both products should have improved added since the original test in 2017.

Disclaimer

I am a solutions engineer at MySQL, and I am not a MongoDB user. There may be ways of configuring both databases to be faster, but I did these tests without modifying any of the default variables which could affect performance. The only exception is that for MySQL, I did a separate test with the binary log disabled (I explain this later in the post).

There were two variables that weren’t related to performance I had to change. For MySQL, I did have to set max_join_size=11000000 when running the queries for the largest database. For MongoDB, I had to set DBQuery.shellBatchSize=1000 or MongoDB would only return 20 rows at a time in the terminal window.

The Equipment

I ran the tests on my own server, and then on a Virtual Machine in Oracle’s Cloud Infrastructure (OCI). My server’s hardware was as follows:

  • Intel 8th Gen Core i7-8700K Processor (6 Cores / 12 Threads)
  • 32GB DDR4 DRAM 2666MHz
  • 500GB PC SSD SATA III 6 Gb/s M.2
  • Gigabyte Z370 AORUS motherboard
  • Mac OS 10.13.6

For OCI, I used the VM.Standard.B1.1 shape, which consisted of the following:

  • Single OCPU – 2.2 GHz Intel® Xeon® E5-2699 v4
  • 11.8 GB of Memory
  • 100 GB Remote Block Volume

Installation

NOTE: I am not going to go into detail how to install MongoDB, NodeJS or MySQL, but I will provide the links.

I began by installing MongoDB via yum. Per the installation instructions, I created a MongoDB repo file (/etc/yum.repos.d/mongodb-org-4.2.repo), edited the repo file and added the following:

[mongodb-org-4.2]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc

Note: gpgcheck is GNU privacy guard, which helps to verify the that you are installing the correct package from MongoDB, and not from a third-party.

I then used this command to install MongoDB via yum.

sudo yum install -y mongodb-org

I also had to install NodeJS. To install, I used the following commands:

curl -sL https://rpm.nodesource.com/setup_10.x | sudo bash -
sudo yum install nodejs

I downloaded the RPM for the latest (version 8.0.19) MySQL community release, and installed it via yum:

sudo yum localinstall mysql80-community-release-el7-3.noarch.rpm -y
sudo yum install mysql-community-server -y

I started both MongoDB and MySQL:

sudo service mongod start
sudo service mysqld start

And I confirmed the status of each:

sudo service mongod status
sudo service mysqld status

Now that I had MongoDB and MySQL installed, I needed some data.

Data

I found some basic JSON data on the data.gov web site – which is a great site for finding generic JSON data. In this case, I needed a large data set, so I found some JSON data which consisted of a list of businesses in the state of Washington (USA). The file I downloaded contained 2.6 million records. The JSON records contained this data:

{
"Ubi": "1234567890",
"Title": "GOVERNOR",
"FirstName": "CHRISTOPHER",
"MiddleName": "WALKEN",
"LastName": "KRAMER",
"Address": "324 SMITHY DR",
"City": "NOWHERE",
"State": "XA",
"Zip": "05252"
}

I know this example isn’t a large document, but it was the only dataset that I could find with millions of records.

Managing the Data

I took the 2.6 million (2,600,000) records, and split the original single file into three files (using the split command). This produced two files containing one million records, and the last file contained 600,000 records. I discarded the last file.

I used one of the files containing one million records, and split it into ten files containing 100,000 records. I then took one of the files with 100,000 records, and split it into ten files with 10,000 records each. I did the same for the 10,000 record file, splitting it into ten 1,000 record files.

I used the same JSON files for both MongoDB and MySQL.

Note: I am not going to share the data I used. Even though the data is public data, it does contain personal information, and as an employee of Oracle, it might violate their privacy policy.

The Databases

I created the same databases to be used on both as well. The naming convention was as follows:

  • db_json_test10k = 10 x 1k records imported – 10k records total
  • db_json_test100k = 10 x 10k records imported – 100k records total
  • db_json_test1000k = 10 x 100k records imported – 1,000k records total
  • db_json_test10000k = 10 x 1000k records imported – 10,000k records total

When importing JSON documents, the databases are automatically created in MongoDB. With MySQL, you have to create them manually.

create database db_json_test10k;
create database db_json_test100k;
create database db_json_test1000k;
create database db_json_test10000k;

The documents inside the database can be created automatically by both MongoDB and MySQL.

The Tests

I did not add any configuration variables for either database – except for what I put in the MongoDB repo file – so the default variables were used for each. For MySQL, the binary log is enabled by default, so I ran the tests with the binary log turned on and turned off. For MySQL, the binary log contains all of the transactions which could change data. In other words, all insert, update and delete transactions are written to a (binary) log on disk. Obviously, running the tests without the binary log enabled was faster in MySQL.

Each test consisted of the following:

  • Drop all databases (if required)
  • Create the MySQL databases
  • Import 1k, 10k, 100k and 1,000k records (10 times each with unique data in each import)
  • Create indexes – executed once per round per database
  • Perform 100 search queries x 10 different times
  • Perform 100 update queries x 10 different times
  • Perform 100 delete queries x 10 different times
  • Repeat four times, for a total of five tests

I recorded the time it took for each part of the test to be executed, and inserted the test results into a spreadsheet. I used the same data files and queries for both MongoDB and MySQL.

The graphs will show you the time for each action and the results are labeled Mongo, MySQL and MySQL-noBL (with the binary log disabled). For the search, update and delete tests, I performed the same 100 unique queries, 10 times against each database. Each series of 10 tests was executed five times, and the time needed to complete each of the individual 10 tests for each database size (10k, 100k, 1000k and 10000k records) was then averaged across all five tests to create the graphs. (Note: for this test, “k” equals 1,000 and not 1,024)

While I was running the tests, the only applications I had open were a text editor, three terminal windows and a spreadsheet program. However, there were some small anomalies in the test results where a test would take a bit longer than the normal run time. I did not correct the data when an anomaly occurred.

Also, not all off the queries returned or modified the same number of rows. For example, one update query might change 10 rows, while another changed 50, so the query result times are not going to be relative to the every other query. But, each query was executed in the same order, so the time it takes to run query #1 should be relative in all five executions of the tests.

Importing Data

For each of the four databases, I imported the same set of records. This involved importing 1k, 10k, 100k and 1,000k records ten times each into their respective databases.

For MySQL, I used the MySQL Shell, which is a command-line client. Here is an example of the import command:

mysqlsh root:password@localhost/db_json_test10k --import /Volumes/HD1/0mongo_v_mysql/1000/1000-0 json_test10k

For MongoDB, I used the mongoimport utility. Here is an example of the import command:

mongoimport --jsonArray --db db_json_test10k --collection json_test10k --file /Volumes/HD1/0mongo_v_mysql/1000/1000-0m

Note: With MySQL, you have to be careful when you import JSON documents – if the syntax is incorrect, the file can import but it will not create a collection (it will create a table instead). The syntax on the import command is important.

Here are the results of the imports. The Y-axis (vertical line) on the graph represents the time in seconds to perform the action.

MongoDB was much faster than MySQL in importing data, even with the MySQL binary log turned off.

Creating Indexes

Whenever you import data into a collection for either MongoDB or MySQL, both instances will automatically create some indexes. After the initial import, I took a look at the indexes:

For MongoDB:

> use db_json_test10k
switched to db db_json_test10k
> db.json_test10k.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "db_json_test10k.json_test10k"
	}
]

For MySQL:

mysql> use db_json_test10k;
Database changed
mysql> SHOW INDEXES FROM json_test10k\G
*************************** 1. row ***************************
        Table: json_test10k
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: _id
    Collation: A
  Cardinality: 98627
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 rows in set (0.00 sec)

Both MongoDB and MySQL automatically create an index on the column _id (which is a primary key as well).

I needed to create an index on the column UBI, as I was going to use this column for my searches, updates and deletes.

I would only need to create the indexes once for every series of tests, but I still tracked how long it took for both MongoDB and MySQL to create the indexes. Here are the commands to create an index for each one:

MySQL

use db_json_test10k
ALTER TABLE json_test10k ADD COLUMN UBI_2 INT GENERATED ALWAYS AS (doc->"$.Ubi");
ALTER TABLE json_test10k ADD INDEX (UBI_2);

MongoDB

use db_json_test10k
db.json_test10k.createIndex( { Ubi: 1 })

Note: MySQL indexes JSON documents via virtual columns. See this blog post for a detailed explanation of virtual columns.

For the tests, I would be creating an index on all four databases for each instance. Here are the results of the index creation for each database. The Y-axis (vertical line) on the graph represents the time in seconds needed to perform the action.

MySQL was a bit faster than MongoDB in index creation.

Searches

The search test consisted of executing ten scripts, where each script performed 100 search queries. The test was executed against all four databases, each containing their own update queries (there was a total of 40 separate scripts running 100 separate queries each). The test was repeated five times and the average times are shown in the graphs.

Here is an example of the search syntax: (Only the first three queries are shown)

MySQL

use db_json_test10k
select * from json_test10k where UBI_2 = '603013962'; 
select * from json_test10k where UBI_2 = '603598341'; 
select * from json_test10k where UBI_2 = '601574968';
...

MongoDB

use db_json_test10k
DBQuery.shellBatchSize = 1000
db.json_test10k.find({"Ubi" : "603013962"})
db.json_test10k.find({"Ubi" : "603598341"})
db.json_test10k.find({"Ubi" : "601574968"})
...

The search queries were placed in files – SQL files for MySQL, and JavaScript files for MongoDB – and each file was executed as follows: (Only the first three queries are shown)

MySQL

time mysql -uroot -p < /Volumes/HD1/0mongo_v_mysql/3_search/mysql_search_sql_10k/search_mysql_100_10k_0.sql
time mysql -uroot -p < /Volumes/HD1/0mongo_v_mysql/3_search/mysql_search_sql_10k/search_mysql_100_10k_1.sql
time mysql -uroot -p < /Volumes/HD1/0mongo_v_mysql/3_search/mysql_search_sql_10k/search_mysql_100_10k_2.sql
...

MongoDB

time mongo < /Volumes/HD1/0mongo_v_mysql/3_search/mongo_search_10k/search_mongo_100_10k_00.js
time mongo < /Volumes/HD1/0mongo_v_mysql/3_search/mongo_search_10k/search_mongo_100_10k_02.js
time mongo < /Volumes/HD1/0mongo_v_mysql/3_search/mongo_search_10k/search_mongo_100_10k_04.js
...

Here are the results of the searches for each database. The Y-axis (vertical line) on the graph represents the time in seconds needed to perform the action.

The results were much different than in the original tests. MySQL was much faster than MongoDB in searches.

Update

The update test consisted of executing ten scripts, where each script contained 100 unique update queries. The test was executed against all four databases, each containing their own update queries (there was a total of 40 separate scripts). The test was repeated five times and the average times are shown in the graphs.

The update syntax was as follows:

MySQL

use db_json_test10k;
SET max_join_size=11000000;
SET SQL_SAFE_UPDATES=0;
UPDATE json_test1k SET doc = JSON_SET(doc, '$.State', 'KS') 
WHERE UBI_2 = '604052443';

MongoDB

use db_json_test10k
DBQuery.shellBatchSize = 1000
db.json_test1k.updateMany({"Ubi" : "604052443"}, {$set: { "State" : "KS"}});

Here are the results of the updates for each database. The Y-axis (vertical line) on the graph represents the time in seconds needed to perform the action. Note that not all of the queries updated the same number of rows, so the results aren’t going to be the same for each group of queries.

Delete

The delete test consisted of executing ten scripts, where each script contained 100 unique delete queries. The test was executed against all four databases, each containing their own update queries (there was a total of 40 separate scripts). The test was repeated five times and the average times are shown in the graphs. (I did have to add the changes to max_join_size and DBQuery.shellBatchSize as I explained earlier)

The delete syntax was as follows: (Only the first three queries are shown)

MySQL

use db_json_test10k;
SET max_join_size=11000000;
SET SQL_SAFE_UPDATES=0;
DELETE FROM json_test1k WHERE UBI_2 = '603013962';
DELETE FROM json_test1k WHERE UBI_2 = '603598341';
DELETE FROM json_test1k WHERE UBI_2 = '601574968';
...

MongoDB

use db_json_test10k
DBQuery.shellBatchSize = 1000
db.json_test1k.deleteMany({"Ubi" : "603013962"});
db.json_test1k.deleteMany({"Ubi" : "603598341"});
db.json_test1k.deleteMany({"Ubi" : "601574968"});
... 

Again, the results were much different than in the original tests. MySQL was much faster than MongoDB in deletions.


Oracle Cloud Infrastructure Test Results

Here are the test results for running the same tests on Oracle Cloud. MongoDB’s imports were still much faster. On the queries, MongoDB performed better, and in some tests, MySQL was competitive only if you had the binary log disabled.

Import

Create Index

Searches

Updates

Deletes


The results from running with only one CPU via OCI are surprising.

Overall, I think the latest version of MySQL is certainly a great alternative to MongoDB for storing JSON documents. If you are using MySQL version 5.7 for storing your JSON documents, I would recommend upgrading to the latest 8.0 version.

Finally – remember, I am not a MongoDB expert. If you have any suggestions on how to tune MongoDB to run faster, please leave a comment below.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots 
Visit http://2044thebook.com for more information.
Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition 
Visit https://amzn.to/2oPFLI0 for more information.
Advertisement