Elasticsearch 索引学习

jopen 9年前

创建索引

创建索引的时候指定分片的个数：

http put :9200/indexsetting number_of_shards=1 number_of_replicas=1    {      "acknowledged": true  }

映射配置

在我们手动配置映射之前，Elasticsearch 可以通过 json 来猜测文档结构当中的字段的类型。如下例子：

http post :9200/test/auto field1='20' field:=10    {      "_id": "AVHNbr0WRh7yMB73pVgC",      "_index": "test",      "_shards": {          "failed": 0,          "successful": 1,          "total": 2      },      "_type": "auto",      "_version": 1,      "created": true  }    http :9200/test/auto/_mapping    {      "test": {          "mappings": {              "auto": {                  "properties": {                      "field": {                          "type": "long"                      },                      "field1": {                          "type": "string"                      }                  }              }          }      }  }

可以看到 field 的类型是 long。当然我们也可以在创建索引的时候指定 numeric_detection 参数为 true 以开启更积极的文本检测。

//创建类型 notauto 的 mapping  http put :9200/test/_mapping/notauto notauto:='{"numeric_detection":true}'    {      "acknowledged": true  }    //添加文档  http post :9200/test/notauto f1='10' f2='20'  {      "_id": "AVHNeiW1Rh7yMB73pVgG",      "_index": "test",      "_shards": {          "failed": 0,          "successful": 1,          "total": 2      },      "_type": "notauto",      "_version": 1,      "created": true  }    //查看字段类型  http :9200/test/notauto/_mapping  {      "test": {          "mappings": {              "notauto": {                  "numeric_detection": true,                  "properties": {                      "f1": {                          "type": "long"                      },                      "f2": {                          "type": "long"                      }                  }              }          }      }  }

但是有个问题就是我们不能从强文本当中推测出布尔值，我们只能在映射定义中直接定义字段。

另外的一个类型是日期类型，我们也可以指定 "dynamic date formats" : ["yyyy-MM-dd hh:mm"] 这个参数可以接收的是一个数组。

禁止字段类型猜想

要关闭自动添加字段，可以把 dynamic 属性设置成 false。

http put :9200/test/_mapping/my my:='{"dynamic":false,"properties":{"ff1":{"type":"string"},"ff2":{"type":"string"}}}'    {      "acknowledged": true  }    http  :9200/test/my/_mapping    {      "test": {          "mappings": {              "my": {                  "dynamic": "false",                  "properties": {                      "ff1": {                          "type": "string"                      },                      "ff2": {                          "type": "string"                      }                  }              }          }      }  }

索引结构映射

如下例子：

cat posts.json    {      "mappings":{          "post": {              "properties": {                  "id" : {                      "type":"long",                      "store":"yes",                      "precision_step":"0"                  },                  "name" : {                      "type":"string",                      "store":"yes",                      "index":"analyzed"                  },                  "published" : {                      "type":"date",                      "store":"yes",                      "precision_step":"0"                  },                  "contents" : {                      "type":"string",                      "store":"no",                      "index":"analyzed"                  }              }          }      }  }    http put :9200/posts < posts.json    {      "acknowledged": true  }    http :9200/posts/_mapping    {      "posts": {          "mappings": {              "post": {                  "properties": {                      "contents": {                          "type": "string"                      },                      "id": {                          "precision_step": 1,                          "store": true,                          "type": "long"                      },                      "name": {                          "store": true,                          "type": "string"                      },                      "published": {                          "format": "strict_date_optional_time||epoch_millis",                          "precision_step": 1,                          "store": true,                          "type": "date"                      }                  }              }          }      }  }

核心类型

string
number
date
boolean
binary

每个类型的公共属性

index_name：定义存储到索引中字段的名称，未定义则使用字段的名字
index：可以设置 analyzed 或 no，字符串类型还可以设置成 not analyzed。设置成 analyzed 该字段被编入搜索以提供搜索。如果设置成 no，将无法搜索该字段。默认是 analyzed，如果字符串类型设置成 not analyzed，那么意味着字段不经过分析直接编入索引，搜索的时候进行全匹配。
store：yes 或者 no，表示是否被写入索引。
boost：默认值是 1。定义了文档中该字段的重要性，值越高越重要。
null_value：如果该字段不是索引的一部分，那么属性的值指定写入索引的值。默认忽略该字段。
copy_to：指定一个字段，字眼的所有值都将复制到该指定字段。
include in all：此属性指定该字段是否应包括在 all字段当中，默认的情况所有字段都会包含在` all`当中。

字符串类型

字符串类型还可以使用如下属性：

term_vector：此属性可以设置成 no、yes、with_offsets、with_positions、with_positions_offsets。定义是否计算该字段的 lucene 词向量，如果使用高亮，那就需要计算这个词向量。
omit_norms：该属性可以设置为 true 和 false。对于经过分析的字符串字段，默认值为 false，而对于未经过分析但已经存入索引的字符串字段，默认设置为 true。当属性为 true 的时候，禁止 lucene 对该字段的加权计算。
analyzer：定义索引和搜索的分析器名称。
index_analyzer：该属性定义创建索引的分析器名称。
search_analyer：定义查询时候的分析器名称。
norms.enabled：字段加权基准。默认是 true，未分析字段是 false。
norms.loading：可以设置成 eager 或 lazy。eager 表示此字段总是加载加权基准。lazy 是指定时候才加载。

数字类型

byte
short
integer
long
float
double

IP地址类型

可以把字段设置 ip 类型，来存放 ip 数据

批量操作

cat bulk.json    {"index":{"_index":"test", "_type":"bulk"}}  {  "name":"rcx", "age":14}  {"index":{"_index":"test", "_type":"bulk"}}  { "name":"rcx1", "age":28}    http post :9200/test/bulk/_bulk < bulk.json  {      "errors": false,      "items": [          {              "create": {                  "_id": "AVHOPSjBRh7yMB73pVgS",                  "_index": "test",                  "_shards": {                      "failed": 0,                      "successful": 1,                      "total": 2                  },                  "_type": "bulk",                  "_version": 1,                  "status": 201              }          },          {              "create": {                  "_id": "AVHOPSjBRh7yMB73pVgT",                  "_index": "test",                  "_shards": {                      "failed": 0,                      "successful": 1,                      "total": 2                  },                  "_type": "bulk",                  "_version": 1,                  "status": 201              }          }      ],      "took": 23  }

索引内部信息

每个文档都有自己的标识符和类型。文档存在两种内部标识符。

_uid：是索引中文档的唯一标识符，由文档的标识符和类型构成，此字段不需要设置，总是被索引。
_id：实际标识符，一般创建文档是时候会传入，如果不传入会自动生成一个。

** _type 字段**

默认情况下文档的类型也会编入索引，但是不会被分析也不会被存储。

** _all 字段**

Elasticsearch 使用 all 字段来存储其他字段中的数据便于搜索。当要执行简单的搜索功能，搜索所有数据，但是有不想去考虑字段名称之类的事情，这个字段很有用。默认情况下，` all是启用的。_all` 字段也可以完全禁止，或者排除某些字段。需要如下修改：

{      "book" : {          "_all" : {              "enabled" : "false"          },          "properties" : {              ...          }      }  }

** _source 字段**

该字段存储原始 json 文档。默认情况下是开启的。如果不需要这个功能可以禁止，与_all禁止的方式相同。

** _index 字段 **

存储文档的索引信息。

** _size 字段**

默认不开启，这个字段使我们可以自动索引 _source 字段的原始大小，并且与文件一起存储。

** _timestamp 字段**

_ttl 字段

time to live，它允许定义文档的生命周期，周期结束后文档被自动删除。默认禁止此属性。

【参考资料】

Elasticsearch服务器开发

原文 http://renchx.com/Elasticsearch2/