We were recently tasked with changing out 3 (of 7) Zookeeper instances without disrupting the customer's environment. I set up a test bed using 5 independent zookeeper instances with a simulated game running on another.
After stopping the zookeeper instance (see Step #8) I changed the connection properties on my db servers and the game simulation. It didn't really help with the game simulation - that data was loaded at the start, but I had it in place for the next time the app was restarted.
These are the steps that worked for me:
- Check status of all running zookeepers. Do the leader last!
- Spin up a new zk instance.
- Assign a higher myid to the new instance than the current cluster.
- Copy the zoo.cfg from a running instance to the new one.
- Add the new instance to the zoo.cfg.
server.10000=10.0.0.154:2888:3888 #New zookeeper instance.
- Start zookeeper on the new machine.
- Ensure that it joins the quorum. "echo stat | nc 127.0.0.1 2181", or there are other ways to do that.
- Stop the instance to be retired - or you can just stop zk on the instance.
- On the new instance - comment out the retired instance and restart zookeeper.
- Ensure that is accepting connections (see #6 above) on all of the running servers.
- Remove the retired zookeeper and add the new zookeeper to the zoo.cfg of the next instance and restart.