Elasticsearch 2.20入门篇:聚合操作
来自: http://my.oschina.net/secisland/blog/614127
聚合(Aggregations)提供分组和统计文档的能力。聚合类似关系数据库中group by分组的功能,在Elasticsearch中,对一次的聚合查询中可以同时得到聚合的具体结果再次进行聚合,这是一个非常有用的功能。你可以通过一次操作得到多次聚合的结果,从而避免多次请求,减少网络和服务器的负担。
数据准备:我们插入几条数据:
请求:POST localhost:9200/customer/external/?pretty
参数:
{"name": "secisland","age":25,"state":"open","gender":"woman","balance":87 }
{"name": "zhangsan","age":32,"state":"close","gender":"man","balance":95 }
{"name": "zhangsan1","age":33,"state":"close","gender":"man","balance":91 }
{"name": "lisi","age":34,"state":"open","gender":"woman","balance":99 }
{"name": "wangwu","age":46,"state":"close","gender":"woman","balance":78 }
其中插入5条数据作为测试。
有了数据后我们进行聚合测试:
例子:将所有的客户按状态分组,然后再返回前10(默认)状态,按统计(也默认)排序:
请求:POST http://localhost:9200/customer/_search?pretty
参数:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }
这个查询条件类似关系数据库中的group by:
SELECT state, COUNT(*) FROM customer GROUP BY state ORDER BY COUNT(*) DESC
返回结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "close", "doc_count" : 3 }, { "key" : "open", "doc_count" : 2 } ] } } }
我们从中可以看出,有2个close状态的客户,2个open状态的用户。
下面我们在上面的基础上再增加一个功能就是在统计状态的同时计算每个状态的平均余额。
请求和刚才一样,但参数变了,请看下面的参数:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
得到的查询结果如下:
{ "took" : 16, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "close", "doc_count" : 3, "average_balance" : { "value" : 88.0 } }, { "key" : "open", "doc_count" : 2, "average_balance" : { "value" : 93.0 } } ] } } }
请仔细观察是如何嵌套在group_by_state聚集中的average_balance聚集。这是聚合的一个常见的模式。可以在聚合后再次聚合任意字段得到我们想要的结果。
在看下面的例子,我们对上面得出的结果中再次对平均账户金额进行降序排列:
请求和之前的一样:
参数:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
得到的查询结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "open", "doc_count" : 2, "average_balance" : { "value" : 93.0 } }, { "key" : "close", "doc_count" : 3, "average_balance" : { "value" : 88.0 } } ] } } }
本文由赛克蓝德(secisland)原创,转载请标明作者和出处。
下面这个例子比较复杂:演示了如何通过年龄组(年龄20-29岁,30-39岁,40-49),然后通过性别,最后得到是每个年龄段,每个性别的平均账户余额:
{ "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } }
查询出的返回结果:
{ "took" : 15, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_age" : { "buckets" : [ { "key" : "20.0-30.0", "from" : 20.0, "from_as_string" : "20.0", "to" : 30.0, "to_as_string" : "30.0", "doc_count" : 1, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 87.0 } } ] } }, { "key" : "30.0-40.0", "from" : 30.0, "from_as_string" : "30.0", "to" : 40.0, "to_as_string" : "40.0", "doc_count" : 3, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "man", "doc_count" : 2, "average_balance" : { "value" : 93.0 } }, { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 99.0 } } ] } }, { "key" : "40.0-50.0", "from" : 40.0, "from_as_string" : "40.0", "to" : 50.0, "to_as_string" : "50.0", "doc_count" : 1, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 78.0 } } ] } } ] } } }
从上面的例子中可以看出,Elasticsearch的聚合能力是非常强大的。
赛克蓝德(secisland)后续会逐步对Elasticsearch的最新版本的各项功能进行分析,近请期待,也欢迎加入secisland公众号进行关注。