elasticsearch的object类型和动态映射，

和通数据库htsjk.Com2019-09-17 01:32 来源:未知阅读:7502 评论 276 热度3

标签：

elasticsearch的object类型和动态映射，

我们需要讨论的最后一个自然JSON数据类型是对象(object)——在其它语言中叫做hash、hashmap、dictionary 或者 associative array.

内部对象(inner objects)经常用于在另一个对象中嵌入一个实体或对象。例如，做为在tweet文档中user_name和user_id的替代，我们可以这样写：

{
    "tweet":            "Elasticsearch is very flexible",
    "user": {
        "id":           "@johnsmith",
        "gender":       "male",
        "age":          26,
        "name": {
            "full":     "John Smith",
            "first":    "John",
            "last":     "Smith"
        }
    }
}

内部对象的映射

Elasticsearch 会动态的检测新对象的字段，并且映射它们为 object 类型，将每个字段加到 properties字段下

{
  "gb": {
    "tweet": { <1>
      "properties": {
        "tweet":            { "type": "string" },
        "user": { <2>
          "type":             "object",
          "properties": {
            "id":           { "type": "string" },
            "gender":       { "type": "string" },
            "age":          { "type": "long"   },
            "name":   { <3>
              "type":         "object",
              "properties": {
                "full":     { "type": "string" },
                "first":    { "type": "string" },
                "last":     { "type": "string" }
              }
            }
          }
        }
      }
    }
  }
}

<1> 根对象.

<2><3> 内部对象.

对user和name字段的映射与tweet类型自己很相似。事实上，type映射只是object映射的一种特殊类型，我们将 object 称为根对象。它与其他对象一模一样，除非它有一些特殊的顶层字段，比如 _source,_all 等等。

内部对象是怎样被索引的

Lucene 并不了解内部对象。一个 Lucene 文件包含一个键-值对应的扁平表单。为了让 Elasticsearch 可以有效的索引内部对象，将文件转换为以下格式：

{
    "tweet":            [elasticsearch, flexible, very],
    "user.id":          [@johnsmith],
    "user.gender":      [male],
    "user.age":         [26],
    "user.name.full":   [john, smith],
    "user.name.first":  [john],
    "user.name.last":   [smith]
}

内部栏位可被归类至name，例如"first"。为了区别两个拥有相同名字的栏位，我们可以使用完整路径，例如"user.name.first" 或甚至类型名称加上路径："tweet.user.name.first"。

注意：在以上扁平化文件中，并没有栏位叫作user也没有栏位叫作user.name。 Lucene 只索引阶层或简单的值，而不会索引复杂的资料结构。

动态映射

当 Elasticsearch 处理一个位置的字段时，它通过【动态映射】来确定字段的数据类型且自动将该字段加到类型映射中。

有时这是理想的行为，有时却不是。或许你不知道今后会有哪些字段加到文档中，但是你希望它们能自动被索引。或许你仅仅想忽略它们。特别是当你使用 Elasticsearch 作为主数据源时，你希望未知字段能抛出一个异常来警示你。

幸运的是，你可以通过 dynamic 设置来控制这些行为，它接受下面几个选项：

true：自动添加字段（默认）

false：忽略字段

strict：当遇到未知字段时抛出异常

dynamic 设置可以用在根对象或任何 object 对象上。你可以将 dynamic 默认设置为 strict，而在特定内部对象上启用它：

PUT /my_index
{
    "mappings": {
        "my_type": {
            "dynamic":      "strict", <1>
            "properties": {
                "title":  { "type": "string"},
                "stash":  {
                    "type":     "object",
                    "dynamic":  true <2>
                }
            }
        }
    }
}

<1> 当遇到未知字段时，my_type 对象将会抛出异常

<2> stash 对象会自动创建字段

通过这个映射，你可以添加一个新的可搜索字段到 stash 对象中：

PUT /my_index/my_type/1
{
    "title":   "This doc adds a new field",
    "stash": { "new_field": "Success!" }
}

但是在顶层做同样的操作则会失败：

PUT /my_index/my_type/1
{
    "title":     "This throws a StrictDynamicMappingException",
    "new_field": "Fail!"
}

备注：将 dynamic 设置成 false 完全不会修改 _source 字段的内容。_source 将仍旧保持你索引时的完整 JSON 文档。然而，没有被添加到映射的未知字段将不可被搜索。