Reading Between the Benchmarks: How MemSQL Designs for Speed and Durability

There’s been a lot of discussion recently about how MemSQL compares to MySQL and other databases. Prior to the recent controversy, we were planning to publish the post below, where we jump in and describe a workload that we ran against MemSQL, MySQL, and MongoDB, all configured for asynchronous durability, on a few different hardware scenarios. Before that, though, we’d like to address some of the comments.

First, we’ve built MemSQL with a sensible set of trade-offs around index performance and durability in mind. The customers that we’ve worked with and optimized for have indeed seen on average a 30 times improvement for their workloads. Of course, just the fact that their data now lives in memory is a major part of the boost. But an important point is that MemSQL’s ease-of-use is what gets them there in the first place. MemSQL helps to relieve the cost of engineering caching or NoSQL infrastructure that syncs back to a database like MySQL.

Of course, we must consider MySQL’s in-memory performance. Jelle, one of our interns, explores this below by tuning InnoDB to behave like MemSQL. But the short answer is that MemSQL is a lot faster. And that’s because the database has been engineered from the ground up to run in memory. Everything from the index data structures (lock-free skip lists and hash tables) to every detail of our durability (log and snapshot format, transaction buffer, etc.) was designed and built for the in-memory, archive-to-disk use case.

Even though MemSQL runs in memory, it logs transactions to disk as fast as the disk will write. MemSQL uses an in-memory transaction buffer that is flushed to disk in a separate log flusher thread. By the time a transaction commits in memory, it has been written to the in-memory buffer. If the amount of data in the buffer exceeds transaction-buffer MB, then writes block until there is space available. Both MongoDB and MySQL are often run with similar configurations.

We expose the performance of this component through the show status query as Transaction_buffer_wait_time. This value measures the cumulative amount of time transaction threads have blocked trying to insert into the transaction buffer. The default size of the transaction buffer, 128 MB, is a heuristic: most disks can easily sequentially write 128 MB/s, so you shouldn’t lose more than a second of data. Visit the durability page in our documentation for more information.

We haven’t yet encountered a client who has asked us to optimize MemSQL for synchronous durability. Nevertheless, we know how to improve it in our current design and can discuss this in a future post.

The post below is meant to be a seed for you to run your own experiments with MemSQL. Since we can’t cover every workload, hardware configuration, and database in a benchmark, we’ve built something simple that runs MemSQL, MySQL, and MongoDB and we hope it will enable the community to continue running its own experiments. The more data points we have, the more we can improve the product.

- MemSQL Engineering

When I joined MemSQL at the beginning of this summer, I decided to write a small benchmark to see on my own how fast MemSQL can be. One of MemSQL’s main strengths is its highly optimized lock-free skiplist implementation. This means that MemSQL should scale almost linearly on machines with many cores. To explore this, I wrote a benchmark comparing the performance of MemSQL, MongoDB, and MySQL. My benchmark simulates a simplified online multiplayer game, with the database responsible for tracking players, games and events. The code for the benchmark is available on github.com/memsql/bench and can easily be extended to other databases.

The databases

The two databases I compared against MemSQL are MySQL and MongoDB. MemSQL is wire compatible with MySQL, so a comparison is natural and very easy. MemSQL and MongoDB have radically different interfaces as MongoDB is a NoSQL database. However, MongoDB is a popular choice for social applications so considering its performance is worthwhile as well.

MemSQL is an in-memory database that stores all the contents of the database in RAM but backs up to disk. MongoDB and MySQL store their data on disk, though can be configured to cache the contents of the disk in RAM and asynchronously write changes back to disk. This fundamental difference influences exactly how MemSQL, MongoDB and MySQL store their data-structures: MemSQL uses lock-free skip lists and hash tables to store its data, whereas MongoDB and MySQL use disk-optimized B-trees.

Description of the benchmark

The benchmark tests performance of the database when used as the backend for an online turn based multiplayer game. It was inspired by mobile phone versions of games such as multiplayer chess. For the core game logic, the database stores players and games. To track statistics, the database also stores a log of all past game actions and average game lengths.

The database stores four kinds of objects:

Players. Every player has an ID, and two statistics: the number of games started and the number of games won.
Games. Each game has an ID, two players and a turn.
Game length statistics. We store a histogram of finished game lengths, composed of a game length and an integer counting the number of games.
Events. Finally, every action performed is stored as an up to 32-character description, together with a timestamp and a player ID.

The benchmark simulates many players simultaneously playing games. Several Python worker processes each simulate hundreds of players. Each worker continually picks a player and then simulates one player action. All actions are logged in the events table. This logic is implemented in benchmark.py, and executes according to the following diagram:

To support this simulation, the database needs to support several operations: find all games a player is currently playing, create a game, store statistics, etc. In the benchmark, these operations are described in base_database.py. Database-specific implementations can be found in sql_database.py and mongo_database.py. Since MySQL and MemSQL are protocol compatible, sql_database.py implements the interface for both MySQL and MemSQL. The MongoDB implementation can be found in mongo_database.py.

To support efficient queries, I added indices on all player ID fields. To allow efficient exploration of past events, I also added an index to the timestamp column in the events table. In MongoDB, the timestamp column is integrated in the object ID and is thus simpler. Furthermore, in MongoDB, embedded documents can be used as an alternative to seperate queries with indices. In this benchmark, I could have stored the list of active games in each player object. However, since each game is used by two players, I decided against this approach as it would move all writes to the player table. That would be an undesirable situation as MongoDB has a per collection write lock.

Settings used

By default, MySQL flushes after every transaction. To make MySQL behave like MongoDB and MemSQL, I set up MySQL to use a transaction buffer. The settings different from the default Ubuntu settings are the following:

default-storage-engine=INNODB
innodb_buffer_pool_size = 4GB
innodb_log_file_size = 128M
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=0
innodb_flush_method=O_DIRECT
innodb_file_per_table

MemSQL and MongoDB are running with their default configurations, which means both are durable and use a transaction buffer. This means that all databases are configured to behave the same way. I used the latest stable version of each database (MongoDB 2.0.6, MySQL 5.5.25, MemSQL 1b).

Results

I ran the benchmark for 10 minutes on several machines. I first ran it on my old MacBook Pro (running Ubuntu natively), which has a dual core Intel(R) Core(TM) 2 Duo T9400. After that, I tried a server with 8 Intel(R) Xeon(R) cores, each running at 2.4GHz. Finally, I tried a 24 core machine with AMD(R) Opteron(TM) processors, each core clocking in at a 1.9GHz. For the two and eight core machines, the benchmark client was running on the same server as the database. For the 24 core machine the benchmark was running on a different server, directly connected over gigabit ethernet. The benchmark is configured to use 140 worker processes and 4000 players per worker process. This can be changed in config.py. I achieved the following number of actions (as described above) processed per second:

<?XML:NAMESPACE PREFIX = [default] http://www.w3.org/2000/svg NS = "http://www.w3.org/2000/svg" />Benchmark ResultsMongoDBMySqlMemSQL2 cores8 cores24 cores0k20k40k60k80kActions per secondHighcharts.com

In all of these configurations, MemSQL outperforms both MongoDB and MySQL. Going from 2 to 8 cores, MemSQL improves its performanced by a factor 5. MySQL is almost four times as fast, and MongoDB is almost three times as fast as in the 2 core scenario. When going from 8 to 24 cores, MemSQL performs more than 3 times faster. However, MongoDB and MySQL perform slower than before. To verify this effect, I limited MongoDB and MySQL to run on 8 of the 24 cores, and they performed about as well as they did running on all 24 cores.

In this benchmark, MemSQL is significantly faster on all fronts. On two cores, MemSQL benefits from its efficient query parsing and pre-compiled code per query. When scaling from two to eight cores, all three databases are able to take advantage of the faster hardware. However, MemSQL scales better thanks to its lock-free data-structures which allow all 8 cores to interact efficiently with the database at once. The 24 core machine has individual cores slower than the individual cores of the 8 core machine. MongoDB and MySQL are not capable of taking advantage of these extra cores; instead, their performance degrades due to the slower cores. MemSQL on the other hand has a speed increase, as it is able to take advantage of all the available cores.

As a final note, the performance of all databases does not change significantly over time. I sampled the number of actions per second every second over the 10 minute run on the 24 core machine, and performance doesn’t change:

Performance Over Timelogarithmic scaleMongoDBMySqlMemSQL0200400600Seconds since benchmark start1001,00010,000100,000Actions per secondHighcharts.com

Conclusion.

In the end, this simulation is just that: a simulation. But it demonstrates that MemSQL is very fast and scales well for write-heavy workloads like this game. We want this code to serve as a template for benchmarking MemSQL and other databases.

We made a strong effort to configure MySQL and MongoDB fairly. These tests can also be run on a variety of different hardware scenarios, network configurations, and modified workloads. There are even tune-able parameters within the provided benchmark (number of worker processes, number of players). We are as interested as you are to see your findings with MemSQL against other databases and in other benchmarks.

To explore MemSQL on your own, visit the MemSQL download page and get the developer edition. If you choose to set up a MemSQL AMI on Amazon EC2, you can get the database up and running on an 8 core machine in less than 5 minutes. The code for this benchmark is available on github. Happy Benchmarking!

source: MemSQL

If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:

Reading Between the Benchmarks: How MemSQL Designs for Speed and Durability

The databases

Description of the benchmark

Settings used

Results

Conclusion.

0 commenti:

Post a Comment

Random Posts

Recent Posts

Popular Posts

Labels

Archive

About Me

Find Us on Flipboard

Find Us On Facebook

My Favorites in Instagram

Find Us On Twitter