/ linux

MySQL: 20 questions every sysadmin faces

MySQL: it's a great tool!  Starts off as a simple bucket you can throw data into, save it in between requests, and use SELECT to contort it into any view you need.  Hey, I'm using MySQL, my data problems are SOLVED.  Forever.


Forever?  You sure about that?  How do you take snapshots of your data?  Accidents happen, so it's prudent to prevent against them.

Ah, so you found mysqldump.  Cool!  That's great when your data set is a couple hundred megabytes.  Did you know that it'll lock your database server solid while taking backups when your data set outgrows that tiny limit?  Ever tried to restore a mysqldump backup?  Yeah, it takes almost the same amount of time to restore as it does to back up.

So pretty quickly mysqldump becomes an antipattern.  Aha, mysqlhotcopy to the rescue!  Wait, it's 2012 and you're using MyISAM tables?  I guess that's OK, if you don't care about ACID guarantees, transaction isolation and rollback, row level locking, or crash recovery.

mylvmbackup is a great tool if your data is stored on an LVM volume.  Beyond that, it's time to look at InnoDB backup tools (you're using those, right?) or using replication and taking the backup on the slave.

Oh, replication!  I see you nodding your head.  That's where the data on one server is copied to the other server instantly, right?  Welllll.... mostly.

Did you know that there's no consistency guarantee between master and slave?  Yep, the slave just reads whenever it's ready and does no deep analysis of the tables to make sure that what it's replicated is completely consistent with the master.  It should be always be consistent, right?  Hmm… I'm not sure.

It's time to learn about pt-table-checksum and find out how to compare your master and slave data to make sure nothing goofy is going on. Oh, and you can't trust the "seconds behind master" counter on the slave, sorry.  Better set up pt-heartbeat to make sure that your replication is working as well as you think.

Ah, so you found out about read-splitting to reduce the load on the master!  Very good, it's a great technique if you can use slightly out-of-date data in your application.  Your slaves are read only, right?  Yeah, MySQL won't stop you from making inconsistent data sets on different servers.  Again, see pt-table-checksum, this needs to be checked at least weekly and for goodness sakes set ALL of your servers to start up read_only.

Ah, so now you want durability in the face of a master failure.  Cool!  It's nice to have if you have a spare database server.  How's your Linux-HA and virtual interface knowledge?  Know how to promote a slave to a master?  Considered using HAproxy?  How are you going to make sure that the slave's read ALL of the master's binary log before it takes over?  In fact, what happens when it CAN'T read the master's log and you need to start it up?  How do you reconcile those "in flight" changes that haven't been replicated?  Did you also check out DRBD? MySQL Multi-Master Manager (MMMM good)?

Semi-synchronous replication, you say?  Great, you're using MySQL 5.5!  You've looked through the changelog and made sure all of your queries to be compatible with the new version of MySQL, right?  Done any testing on the upgrade to make sure that it'll still perform well on your dataset?  (Hint: pt-upgrade)

Ah, now you want to scale up with MySQL Cluster!  Yep, MySQL (or Percona) cluster is a great thing to have, and you get the consistency guarantees that plain MySQL replication doesn't make.  Make sure you've got at least 3 nodes and are prepared to fix problems as soon as one goes down.  You're monitoring all of this, right?

Oh, and you've totally got data warehousing covered, right?


These questions and their answers are the life of a DBA and systems administrator.  There are a thousand questions that no one ever thinks of when they choose a technology or application platform.  Hell, most of those you don't even know until you're entrenched and 3 months down the road scaling things up and out.

None of this is to knock MySQL, it's a great piece of software and it's got plenty of big-iron traction (c.f. Facebook).  Heck, Oracle liked so much they went out and bought themselves a shiny new MySQL AB.

If you can't tell, I've been entrenched in MySQL docs and blogs and tools for a week and my head's about to burst. :-)

This is why, when I get a chance, I dink with tiny little databases like Redis, and SQLite.

Redis is so very, very simple it's a beautiful thing to work with.  It's not a good fit for every data set, but there are niches where it just flies.  Message middleware, for example — would you rather have one tiny Redis server or a whole Stomp/AMQ stack and attached gear?

Why do we make these things so complex?  (Computers are hard, let's go shopping!)

Oh wait, now I remember.  Answering these questions are why I get paid the big bucks.