elasticsearch的bulk操作

goby1220 9年前

来自: https://segmentfault.com/a/1190000004426546


本文主要记录如何用curl进行es的bulk操作。

bulk请求

准备数据

vim documents.json  { "index": {"_index": "library", "_type": "book", "_id": "1"}}  { "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3}  { "index": {"_index": "library", "_type": "book", "_id": "2"}}  { "title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark", "Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false, "section" : 1}  { "index": {"_index": "library", "_type": "book", "_id": "3"}}  { "title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters": ["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags": [],"copies": 0, "available" : false, "section" : 12}  { "index": {"_index": "library", "_type": "book", "_id": "4"}}  { "title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}

关闭refresh

curl -XPUT '192.168.99.100:9200/library -d '  {      "settings":{          "refresh_interval":"-1"      }  }  '

发送请求

curl -s -XPOST '192.168.99.100:9200/_bulk' --data-binary @document.json  {"took":2603,"errors":false,"items":[{"index":{"_index":"library","_type":"book","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"status":201}},{"index":{"_index":"library","_type":"book","_id":"2","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}},{"index":{"_index":"library","_type":"book","_id":"3","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}},{"index":{"_index":"library","_type":"book","_id":"4","_version":2,"_shards":{"total":2,"successful":2,"failed":0},"status":200}}]}%

refresh

更改回每隔1s将内存的segment刷回文件系统缓存

curl -XPUT '192.168.99.100:9200/library -d '  {      "settings":{          "refresh_interval":"1"      }  }  '

或者再手动刷新一次

curl -XPOST '192.168.99.100:9200/_refresh

head插件安装

cd /usr/share/elasticsearch  ./bin/plugin install mobz/elasticsearch-head

重启es

cd /etc/init.d  ./elasticsearch restart
{      "query": {          "query_string": {              "query": "title:crime"          }      }  }

要返回版本信息的话:

{      "version": true,       "query": {          "query_string": {              "query": "title:crime"          }      }  }

返回指定字段:

{      "fields": ["title","year"],       "query": {          "query_string": {              "query": "title:crime"          }      }  }

关于flush

refresh只是将内存的segment刷回到文件系统缓存(刷到文件系统缓存中lucene就可以检索这个segment),还没有到磁盘。es在将数据写入内存buffer同时,会写一份translog日志,refresh的时候,translog保持原样。
flush是真正把segment刷回到磁盘,更新commit文件(该文件用来记录索引中的所有segment)时,translog清空的过程。这个flush的频率默认是30分钟主动flush一次,或者translog大小大于512M时主动flush一次。

参考