Solr: SolrCloud，

和通数据库htsjk.Com2019-08-24 23:41 来源:未知阅读:3034 评论 104 热度5

标签：

Solr: SolrCloud，

Distributed indexing

Document shard assignment

A document is assigned to one and only one shard per collection. Solr uses a component called a document router to determine which shard a document should be assigned to. There are two basic document-routing strategies supported by SolrCloud: compositeId (default) and implicit.

Solr uses the MurmurHash algorithm, because it’s fast and creates an even distribution of hash values, which keeps the number of documents in each shard balanced (roughly).

Adding documents

You can send update requests to any node in the cluster, and the request will be forwarded to the correct shard leader.

STEP 1: SEND THE UPDATE REQUEST USING CLOUDSOLRSERVER

STEP 2: ROUTE THE DOCUMENT TO THE CORRECT SHARD

STEP 3: LEADER ASSIGNS VERSION ID

STEP 4: FORWARD REQUEST TO REPLICAS

STEP 5: ACKNOWLEDGE WRITE SUCCESS

Near real-time search

NRTmakes documents visible in search results within seconds of their being indexed,hence the use of the near qualifier. To allow documents to be visible in NRT, Solr provides a soft commit mechanism, which skips the costly aspects of hard commits, such as flushing documents stored in memory to disk.

cache autowarming settings and warming queries must execute faster than your soft commit frequency.

Although NRT search is a powerful feature, you do not have to use it with SolrCloud. It’s perfectly acceptable to not use soft commits, and we recommend not using them unless you really need indexed documents to be visible in near real-time. Do not feel like you must use NRT search when using SolrCloud. One of the drawbacks to using soft commits is that your caches are constantly being invalidated

Node recovery process

SolrCloud supports two basic recovery scenarios: peer sync and snapshot replication. The recovery process for these two scenarios is differentiated by how many update requests (add, delete, update) the recovering node missed while it was offline.

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

Distributed search

Once you shard your index, you have a new problem: you must query all shards to get a complete result set. Querying across all shards in a collection to create a unified result set is known as a distributed query. The distrib parameter determines if a query is distributed or local; when SolrCloud mode is enabled, distrib defaults to true.

Multistage query process

Distributed queries work differently than nondistributed queries because Solr needs to gather results for all shards, then merge the results into a single response to the client. Solr uses a multistage query process to execute distributed queries.

STEP 1: CLIENT SENDS QUERY TO ANY NODE

STEP 2: QUERY CONTROLLER RECEIVES REQUEST

STEP 3: QUERY STAGE

STEP 4: GET FIELDS STAGE

Distributed search limitations

Unfortunately, not all Solr query features work in distributed mode. Specifically, there are three main limitations you should be aware of: