elasticsearch: How to reinitialize a node? -
elasticsearch 1.7.2 on centos
we have 3 node cluster has been running fine. networking problem caused "b" node lose network access. (it turns out c node had "minimum_master_nodes" 1, not 2.)
so poking along node.
we fixed issues on b , c nodes, refuse come , join cluster. on b , c:
# curl -xget http://localhost:9200/_cluster/health?pretty=true { "error" : "masternotdiscoveredexception[waited [30s]]", "status" : 503 }
the elasticsearch.yml follows (the name on "b" , "c" nodes reflected in node names on systems, also, ip addys on each node reflect other 2 nodes, however, on "c" node, index.number_of_replicas mistakenly set 1.)
cluster.name: elasticsearch-prod node.name: "prod-node-3a" node.master: true index.number_of_replicas: 2 discovery.zen.minimum_master_nodes: 2 discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["192.168.3.100", "192.168.3.101"]
we have no idea why won't join. have network visibility a, , can see them. each node correctly has other 2 defined in "discovery.zen.ping.unicast.hosts:"
on b , c, log sparse, , tells nothing:
# cat elasticsearch.log [2015-09-24 20:07:46,686][info ][node ] [the profile] version[1.7.2], pid[866], build[e43676b/2015-09-14t09:49:53z] [2015-09-24 20:07:46,688][info ][node ] [the profile] initializing ... [2015-09-24 20:07:46,931][info ][plugins ] [the profile] loaded [], sites [] [2015-09-24 20:07:47,054][info ][env ] [the profile] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [148.7gb], net total_space [157.3gb], types [rootfs] [2015-09-24 20:07:50,696][info ][node ] [the profile] initialized [2015-09-24 20:07:50,697][info ][node ] [the profile] starting ... [2015-09-24 20:07:50,942][info ][transport ] [the profile] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.181.3.138:9300]} [2015-09-24 20:07:50,983][info ][discovery ] [the profile] elasticsearch/pojoip-ztxufx_lxlwvdew [2015-09-24 20:07:54,772][info ][cluster.service ] [the profile] new_master [the profile][pojoip-ztxufx_lxlwvdew][elastic-search-3c-prod-centos-case-48307][inet[/10.181.3.138:9300]], reason: zen-disco-join (elected_as_master) [2015-09-24 20:07:54,801][info ][http ] [the profile] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.181.3.138:9200]} [2015-09-24 20:07:54,802][info ][node ] [the profile] started [2015-09-24 20:07:54,880][info ][gateway ] [the profile] recovered [0] indices cluster_state [2015-09-24 20:42:45,691][info ][node ] [the profile] stopping ... [2015-09-24 20:42:45,727][info ][node ] [the profile] stopped [2015-09-24 20:42:45,727][info ][node ] [the profile] closing ... [2015-09-24 20:42:45,735][info ][node ] [the profile] closed
how bring whole beast life?
- rebooting b , c makes no difference @ all
- i hesitant cycle a, our app hitting...
well, not know brought life, kind of magically came up.
i believe shard reroute, (shown here: elasticsearch: did lose data when 2 of 3 nodes went down? ) caused nodes rejoin cluster. our theory node a, surviving node, not "healthy" master, because knew 1 shard (the "p" cut of shard 1, spelled out here: elasticsearch: did lose data when 2 of 3 nodes went down? ) not allocated.
since master knew not intact, other nodes declined join cluster, throwing "masternotdiscoveredexception"
once got "p" shards assigned surviving node, other nodes joined up, , did whole replicating dance.
however data lost allocating shard that. set new cluster, , rebuilding index (which takes several days).
Comments
Post a Comment