Solr: SolrCloud,
Distributed indexing
Document shard assignment
A document is assigned to one and only one shard per collection. Solr uses a component called a document router to determine which shard a document should be assigned to. There are two basic document-routing strategies supported by SolrCloud: compositeId (default) and implicit.
Solr uses the MurmurHash algorithm, because it’s fast and creates an even distribution of hash values, which keeps the number of documents in each shard balanced (roughly).
Adding documents
You can send update requests to any node in the cluster, and the request will be forwarded to the correct shard leader.
STEP 1: SEND THE UPDATE REQUEST USING CLOUDSOLRSERVER
STEP 2: ROUTE THE DOCUMENT TO THE CORRECT SHARD
STEP 3: LEADER ASSIGNS VERSION ID
STEP 4: FORWARD REQUEST TO REPLICAS
STEP 5: ACKNOWLEDGE WRITE SUCCESS
Near real-time search
NRTmakes documents visible in search results within seconds of their being indexed,hence the use of the near qualifier. To allow documents to be visible in NRT, Solr provides a soft commit mechanism, which skips the costly aspects of hard commits, such as flushing documents stored in memory to disk.
cache autowarming settings and warming queries must execute faster than your soft commit frequency.
Although NRT search is a powerful feature, you do not have to use it with SolrCloud. It’s perfectly acceptable to not use soft commits, and we recommend not using them unless you really need indexed documents to be visible in near real-time. Do not feel like you must use NRT search when using SolrCloud. One of the drawbacks to using soft commits is that your caches are constantly being invalidated
Node recovery process
SolrCloud supports two basic recovery scenarios: peer sync and snapshot replication. The recovery process for these two scenarios is differentiated by how many update requests (add, delete, update) the recovering node missed while it was offline.
-------------------------------------------------
Distributed search
Once you shard your index, you have a new problem: you must query all shards to get a complete result set. Querying across all shards in a collection to create a unified result set is known as a distributed query. The distrib parameter determines if a query is distributed or local; when SolrCloud mode is enabled, distrib defaults to true.
Multistage query process
Distributed queries work differently than nondistributed queries because Solr needs to gather results for all shards, then merge the results into a single response to the client. Solr uses a multistage query process to execute distributed queries.
STEP 1: CLIENT SENDS QUERY TO ANY NODE
STEP 2: QUERY CONTROLLER RECEIVES REQUEST
STEP 3: QUERY STAGE
STEP 4: GET FIELDS STAGE
Distributed search limitations
Unfortunately, not all Solr query features work in distributed mode. Specifically, there are three main limitations you should be aware of: