elasticsearch多字段搜索，elasticsearch字段

和通数据库htsjk.Com2019-08-07 05:04 来源:未知阅读:1764 评论 13 热度5

标签：

elasticsearch多字段搜索，elasticsearch字段

多字段搜索

多字符串查询

boost 参数 “最佳” 值，较为简单的方式就是不断试错，比较合理的区间处于 1 到 10 之间，当然也有可能是 15 。如果为 boost 指定比这更高的值，将不会对最终的评分结果产生更大影响，因为评分是被归一化的

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": {
            "title":  {
              "query": "War and Peace",
              "boost": 2
        }}},
        { "match": {
            "author":  {
              "query": "Leo Tolstoy",
              "boost": 2
        }}},
        { "bool":  {            #  不写在上面一层，是因为tarnslator理论是只占总评分的三分之一，在上面一层就是四分之一了
            "should": [
              { "match": { "translator": "Constance Garnett" }},
              { "match": { "translator": "Louise Maude"      }}
            ]
        }}
      ]
    }
  }
}

最佳字段

dis_max(Disjunction Max Query)查询,意思是或,指的是：将任何与任一查询匹配的文档作为结果返回，但只将最佳匹配的评分作为查询的评分结果返回;为了理解这句话，做给小实验

DELETE my_index

PUT /my_index/my_type/1
{
    "title": "Quick brown fox rabbits",
    "body":  "Brown eats rabbits are commonly seen."
}

PUT /my_index/my_type/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

GET  /my_index/my_type/_search
{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox eats" }},
                { "match": { "body":  "Brown fox eats" }}
            ]
        }
    }
}

结果:id为1的文档在前面

GET  /my_index/my_type/_search
{
    "query": {
        "dis_max": {            # dis_max查询，文档查询的某个match评分最高的作为结果返回
            "queries": [
                { "match": { "title": "Brown fox eats" }},
                { "match": { "body":  "Brown fox eats" }}
            ]
        }
    }
}

结果:

最佳字段查询调优

tie_breaker：一个简单的 dis_max 查询会采用单个最佳匹配字段，而忽略其他的匹配，指定 tie_breaker 这个参数可以将其他匹配语句的评分也考虑其中；比如：

结果：

结果：
tie_breaker 参数提供了一种 dis_max 和 bool 之间的折中选择，范围[0,1]范围建议0.1-0.4, 0 代表使用 dis_max 最佳匹配语句的普通逻辑， 1 表示所有匹配语句同等重要，步骤如下：

multi_match查询

为能在多个字段上反复执行相同查询提供了一种便捷方式，multi_match 多匹配查询的类型有多种，其中的三种恰巧与了解我们的数据中介绍的三个场景对应，即： best_fields 、 most_fields 和 cross_fields （最佳字段、多数字段、跨字段），默认情况下，查询的类型是 best_fields ，这表示它会为每个字段生成一个 match 查询，然后将它们组合到 dis_max 查询的内部

GET /my_index/my_type/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": {
              "query": "Quick brown fox",
              "minimum_should_match": "30%"
            }
          }
        },
        {
          "match": {
            "body": {
              "query": "Quick brown fox",
              "minimum_should_match": "30%"
            }
          }
        }
      ],
      "tie_breaker": 0.3
    }
  }
}

查询等价于

GET /my_index/my_type/_search
{
  "query": {
    "multi_match": {
      "query": "Quick brown fox",
      "type": "best_fields",  # 默认为best_fields可以不指定
      "fields": [
        "title",
        "body"
      ],
      "tie_breaker": 0.3,
      "minimum_should_match": "30%"  # 这样的参数会被传递到生成的 match 查询中
    }
  }
}

结果:

查询字段名称的模糊匹配

字段名称可以用模糊匹配的方式给出

{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": "*_title"
    }
}

提升单个字段的权重

可以使用 ^ 字符语法为单个字段提升权重，在字段名称的末尾添加 ^boost

{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": [ "*_title", "chapter_title^2" ]
    }
}

多字段映射

是对我们的字段索引两次，一次使用词干模式以及一次非词干模式
添加多字段映射

DELETE /my_index

PUT /my_index
{
    "settings": { "number_of_shards": 1 },
    "mappings": {
        "my_type": {
            "properties": {
                "title": {
                    "type":     "string",
                    "analyzer": "english",
                    "fields": {
                        "std":   {
                            "type":     "string",
                            "analyzer": "standard"
                        }
                    }
                }
            }
        }
    }
}

Put值

PUT /my_index/my_type/1
{ "title": "My rabbit jumps" }

PUT /my_index/my_type/2
{ "title": "Jumping jack rabbits" }

get title


GET /my_index/_search
{
   "query": {
        "match": {
            "title": "jumping rabbits"
        }
    }
}

# 结果命中2条

get title.std

GET /my_index/_search
{
   "query": {
        "match": {
            "title.std": "jumping rabbits"
        }
    }
}
# 结果命中1条

most_fields 合并两次索引的评分，加权重

GET /my_index/_search
{
   "query": {
        "multi_match": {
            "query":  "jumping rabbits",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ]
        }
    }
}

跨字段实体搜索

当多个属性结合起来决定一个事物的时候，可以使用multi_match查询(依次查询每个字段并将每个字段的匹配评分结果相加),比如
以下字段表示一个人信息

{
    "street":   "5 Poland Street",
    "city":     "London",
    "country":  "United Kingdom",
    "postcode": "W1V 3DG"
}

可以如下查询

{
  "query": {
    "bool": {
      "should": [
        { "match": { "street":    "Poland Street W1V" }},
        { "match": { "city":      "Poland Street W1V" }},
        { "match": { "country":   "Poland Street W1V" }},
        { "match": { "postcode":  "Poland Street W1V" }}
      ]
    }
  }
}

或

{
  "query": {
    "multi_match": {
      "query":       "Poland Street W1V",
      "type":        "most_fields",     # 合并所有匹配字段的评分
      "fields":      [ "street", "city", "country", "postcode" ]
    }
  }
}

most_fields也存在些问题

自定义_all

copy_to 参数来实现给字段添加自定义_all字段

PUT /my_index
{
    "mappings": {
        "person": {
            "properties": {
                "first_name": {
                    "type":     "string",
                    "copy_to":  "full_name"
                },
                "last_name": {
                    "type":     "string",
                    "copy_to":  "full_name"
                },
                "full_name": {
                    "type":     "string"
                }
            }
        }
    }
}

可通过地址http://blog.csdn.net/jiao_fuyou/article/details/49800969来更深入学习_all

cross-fields跨字段查询

自定义 _all 的方式是一个好的解决方案，只需在索引文档前为其设置好映射，然而还可以使用cross_fields 类型进行 multi_match 查询
cross_fields 使用词中心式（term-centric）的查询方式，这与 best_fields 和 most_fields 使用字段中心式（field-centric）的查询方式非常不同
字段中心式

GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "most_fields",
            "operator":    "and",
            "fields":      [ "first_name", "last_name" ]
        }
    }
}

对于匹配的文档， peter 和 smith 都必须同时出现在相同字段中，要么是 first_name 字段，要么 last_name 字段

(+first_name:peter +first_name:smith)
(+last_name:peter  +last_name:smith)

词中心式，词 peter 和 smith 都必须出现，但是可以出现在任意字段中，cross_fields 类型首先分析查询字符串并生成一个词列表，然后它从所有字段中依次搜索每个词

GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields",
            "operator":    "and",
            "fields":      [ "first_name", "last_name" ]
        }
    }
}

为了让 cross_fields 查询以最优方式工作，所有的字段都须使用相同的分析器

采用 cross_fields 查询与自定义 _all 字段相比，其中一个优势就是它可以在搜索时为单个字段提升权重

GET /books/_search
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields",
            "fields":      [ "title^2", "description" ]
        }
    }
}

需要在 multi_match 查询中避免使用 not_analyzed 字段