Logstash+ElasticSearch处理mysql慢查询日志

yeyou 8年前
   <p>遇到一个需求, 需要查询某些业务的慢查询日志. 结果DBA平台那边提供的慢查询日志不能解决实际的业务场景(上报的字段补全), 无奈, 自己挽起袖子上</p>    <p>参考了 <a href="/misc/goto?guid=4959673684841117832" rel="nofollow,noindex">这篇文章</a> , 不过自己根据需求做了较多的变更</p>    <p>开始吧</p>    <h2>1. 找到日志的位置</h2>    <p>先确认是否开启了, 然后找到日志文件的位置</p>    <pre>  > show variables like '%slow%';  +---------------------+-------------------------------------+  | Variable_name       | Value                               |  +---------------------+-------------------------------------+  | log_slow_queries    | ON                                  |  | slow_launch_time    | 2                                   |  | slow_query_log      | ON                                  |  | slow_query_log_file | /data/mysqllog/20000/slow-query.log |  +---------------------+-------------------------------------+</pre>    <h2>2. 慢查询日志</h2>    <p>格式基本是如下, 当然, 格式如果有差异, 需要根据具体格式进行小的修改</p>    <pre>  # Time: 160524  5:12:29  # User@Host: user_a[xxxx] @  [10.166.140.109]  # Query_time: 1.711086  Lock_time: 0.000040 Rows_sent: 385489  Rows_examined: 385489  use dbname;  SET timestamp=1464037949;  SELECT 1 from dbname;</pre>    <h2>3. 使用 logstash 采集</h2>    <p>采集, 无非是用 multiline 进行多行解析</p>    <p>但是, 需要处理的</p>    <p>第一个是, 去除掉没用的信息</p>    <p>第二个, 慢查询sql, 是会反复出现的, 所以, 执行次数成了一个很重要的指标. 我们要做的, 就是 降噪 (将参数去掉, 涉及带引号的内容+数字), 将参数类信息过滤掉, 留下核心的sql, 然后计算出一个hash, 这样就可以在查询, 根据这个字段进行聚合. 这里用到了 <a href="/misc/goto?guid=4959673684923410287" rel="nofollow,noindex">mutate</a> 以及 <a href="/misc/goto?guid=4959673685011459713" rel="nofollow,noindex">checksum</a></p>    <pre>  # calculate unique hash    mutate {      add_field => {"sql_for_hash" => "%{sql}"}    }    mutate {      gsub => [          "sql_for_hash", "'.+?'", "",          "sql_for_hash", "-?\d*\.{0,1}\d+", ""      ]    }    checksum {      algorithm => "md5"      keys => ["sql_for_hash"]    }</pre>    <p>最后算出来的md5, 放入了 logstash_checksum</p>    <p>完整的logstash配置文件(具体使用可能需要根据自身日志格式做些小调整) 注意, 里面的pattern ALLWORD [\s\S]*</p>    <pre>  input {    file {      path => ["/data/mysqllog/20000/slow-query.log"]      sincedb_path => "/data/LogNew/logstash/sincedb/mysql.sincedb"      type => "mysql-slow-log"      add_field => ["env", "PRODUCT"]      codec => multiline {        pattern => "^# User@Host:"        negate => true        what => previous      }    }  }  filter {    grok {      # User@Host: logstash[logstash] @ localhost [127.0.0.1]      # User@Host: logstash[logstash] @  [127.0.0.1]      match => [ "message", "^# User@Host: %{ALLWORD:user}\[%{ALLWAORD}\] @ %{ALLWORD:dbhost}? \[%{IP:ip}\]" ]    }    grok {      # Query_time: 102.413328  Lock_time: 0.000167 Rows_sent: 0  Rows_examined: 1970      match => [ "message", "^# Query_time: %{NUMBER:duration:float}%{SPACE}Lock_time: %{NUMBER:lock_wait:float}%{SPACE}Rows_sent: %{NUMBER:results:int}%{SPACE}Rows_examined:%{SPACE}%{NUMBER:scanned:int}%{ALLWORD:sql}"]    }      // remove useless data    mutate {      gsub => [          "sql", "\nSET timestamp=\d+?;\n", "",          "sql", "\nuse [a-zA-Z0-9\-\_]+?;", "",          "sql", "\n# Time: \d+\s+\d+:\d+:\d+", "",          "sql", "\n/usr/local/mysql/bin/mysqld.+$", "",          "sql", "\nTcp port:.+$", "",          "sql", "\nTime .+$", ""      ]    }      # Capture the time the query happened    grok {      match => [ "message", "^SET timestamp=%{NUMBER:timestamp};" ]    }    date {      match => [ "timestamp", "UNIX" ]    }        # calculate unique hash    mutate {      add_field => {"sql_for_hash" => "%{sql}"}    }    mutate {      gsub => [          "sql_for_hash", "'.+?'", "",          "sql_for_hash", "-?\d*\.{0,1}\d+", ""      ]    }    checksum {      algorithm => "md5"      keys => ["sql_for_hash"]    }      # Drop the captured timestamp field since it has been moved to the time of the event    mutate {      # TODO: remove the message field      remove_field => ["timestamp", "message", "sql_for_hash"]    }  }  output {      #stdout{      #    codec => rubydebug      #}      #if ("_grokparsefailure" not in [tags]) {      #    stdout{      #        codec => rubydebug      #    }      #}      if ("_grokparsefailure" not in [tags]) {          elasticsearch {            hosts => ["192.168.1.1:9200"]            index => "logstash-slowlog"          }      }  }</pre>    <p>采集进去的内容</p>    <pre>  {             "@timestamp" => "2016-05-23T21:12:59.000Z",               "@version" => "1",                   "tags" => [          [0] "multiline"      ],                   "path" => "/Users/ken/tx/elk/logstash/data/slow_sql.log",                   "host" => "Luna-mac-2.local",                   "type" => "mysql-slow",                    "env" => "PRODUCT",                   "user" => "dba_bak_all_sel",                     "ip" => "10.166.140.109",               "duration" => 28.812601,              "lock_wait" => 0.000132,                "results" => 749414,                "scanned" => 749414,                    "sql" => "SELECT /*!40001 SQL_NO_CACHE */ * FROM `xxxxx`;",      "logstash_checksum" => "3e3ccb89ee792de882a57e2bef6c5371"  }</pre>    <h2>4. 写查询</h2>    <p>查询, 我们需要按 logstash_checksum 进行聚合, 然后按照次数由多到少降序展示, 同时, 每个 logstash_checksum 需要有一条具体的sql进行展示</p>    <p>通过 es 的 <a href="/misc/goto?guid=4959673685091387237" rel="nofollow,noindex">Top hits Aggregation</a> 可以完美地解决这个查询需求</p>    <p>查询的query</p>    <pre>  body = {      "from": 0,      "size": 0,      "query": {          "filtered": {              "query": {                  "match": {                      "user": "test"                  }              },              "filter": {                  "range": {                      "@timestamp": {                          "gte": "now-1d",                          "lte": "now"                      }                  }              }          }      },      "aggs": {          "top_errors": {              "terms": {                  "field": "logstash_checksum",                  "size": 20              },              "aggs": {                  "top_error_hits": {                      "top_hits": {                          "sort": [                              {                                  "@timestamp":{                                      "order": "desc"                                  }                              }                          ],                          "_source": {                              "include": [                                 "user" , "sql", "logstash_checksum", "@timestamp", "duration", "lock_wait", "results", "scanned"                              ]                          },                          "size" : 1                      }                  }              }          }      }  }</pre>    <p>跟这个写法相关的几个参考链接: <a href="/misc/goto?guid=4959673685172226558" rel="nofollow,noindex">Terms Aggregation</a> / <a href="/misc/goto?guid=4959673685249542871" rel="nofollow,noindex">Elasticsearch filter document group by field</a></p>    <h2>5. 渲染页面</h2>    <p>python的后台, 使用 sqlparse 包, 将sql进行格式化(换行/缩进/大小写), 再往前端传. <a href="/misc/goto?guid=4959673685334648149" rel="nofollow,noindex">sqlparse</a></p>    <pre>  >>> sql = 'select * from foo where id in (select id from bar);'  >>> print sqlparse.format(sql, reindent=True, keyword_case='upper')  SELECT *  FROM foo  WHERE id IN    (SELECT id     FROM bar);</pre>    <p>然后在页面上, 使用js进行语法高亮 <a href="/misc/goto?guid=4959673685416650046" rel="nofollow,noindex">code-prettify</a></p>    <p> </p>    <p>来自: <a href="/misc/goto?guid=4959673685494300937" rel="nofollow">http://www.wklken.me/posts/2016/05/24/elk-mysql-slolog.html</a></p>    <p> </p>