Friday, June 26, 2015

Changing Out a Zookeeper Instance - Rolling restart

Changing Out a Zookeeper Instance:

We were recently tasked with changing out 3 (of 5) Zookeeper instances without disrupting the customer's environment. I set up a test bed using 5 independent zookeeper instances with a simulated game running on another.



After stopping the zookeeper instance (see Step #8) I changed the connection properties on my db servers and the game simulation. It didn't really help with the game simulation - that data was loaded at the start, but I had it in place for the next time the app was restarted.

These are the steps that worked for me:

  1. Check status of all running zookeepers. Do the leader last! 
  2. Spin up a new zk instance.
  3. Assign a higher myid to the new instance than the current cluster.
  4. Copy the zoo.cfg from a running instance to the new one.
  5. Add the new instance to the zoo.cfg.
    e.g.
    server.9999=10.0.0.100:2888:3888
    ...
    server.9995=10.0.0.242:2888:3888
    server.10000=10.0.0.154:2888:3888  #New zookeeper instance.
  6. Start zookeeper on the new machine.
  7. Ensure that it joins the quorum. "echo stat | nc 127.0.0.1 2181", or there are other ways to do that.
  8. Stop the instance to be retired - or you can just stop zk on the instance.
  9. On the new instance - comment out the retired instance and restart zookeeper.
  10. Ensure that is accepting connections (see #6 above) on all of the running servers.
  11. Remove the retired zookeeper and add the new zookeeper to the zoo.cfg of the next instance and restart.
Hope that works for you.

No comments:

Post a Comment