ELK 搭建与使用 (5.4)

ELK 是什么?

ElasticSearch + logstash + kibana 是一套开源的日志管理检索方案

我采用的采集流程

+-----------+
|           |
| filebeat1 +------+
|           |      |    +-------+    +-------------+     +-----------------+    +----------+
+-----------+      |    |       |    |             |     |                 |    |          |
                   +--> | redis +--> |  logstash   +---> |  ElasticSearch  +--->+  kibana  |
+-----------+      |    |       |    |             |     |                 |    |          |
|           |      |    +-------+    +-------------+     +-----------------+    +----------+
| filebeat2 +------+
|           |
+-----------+

RPUSH BLPOP

filebeat 采集

一般, filebeat只做简单的采集, 与简单的聚合, 例如多行聚合, 剩下的数据筛选之类的, 都交给后面一层专业的 logstash 处理.

config

filebeat.prospectors:
- input_type: log
  paths:
    - /alidata/log/tomcat-*/catalina.out
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} \['
  multiline.negate: true
  multiline.match: after

output.redis:
  hosts: "192.168.1.61:6379"
  password: "redispwd"
  key: "tomcat-catalina-log"
  db: 0
  timeout: 5

ref: redis-output

redis

保存在redis上是以队列的形式, 也就是list, 用命令查出来是这样的 lpop tomcat-catalina-log

{
  "@timestamp": "2017-06-21T03:40:42.312Z",
  "beat": {
    "hostname": "front-end",
    "name": "front-end",
    "version": "5.4.2"
  },
  "input_type": "log",
  "message": "2017-06-21 11:40:33,572 [ERROR] xxx.xxxxxxx.Logger:87 - org.springframework.dao.DataIntegrityViolationException: SqlMapClient operation; SQL []; Column 'FID' cannot be null; nested exception is java.sql.BatchUpdateException: Column 'FID' cannot be null\\n\\tat org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:102)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)\\n\\tat org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:204)\\n\\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\\n\\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\\n\\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\\n\\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\\n\\tat java.lang.Thread.run(Thread.java:745)\\n",
  "offset": 29117034,
  "source": "/alidata/log/tomcat-xxx/catalina.out",
  "type": "log"
}

logstash

  • 首先, 不要尝试自己写正则了, java日志与tomcat日志都是很常见的
  • 导入配置 Github-patterns

    JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*
    #Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source'
    JAVAFILE (?:[A-Za-z0-9_. -]+)
    #Allow special <init>, <clinit> methods
    JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
    #Line number is optional in special cases 'Native method' or 'Unknown source'
    JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:class}\.%{JAVAMETHOD:method}\(%{JAVAFILE:file}(?::%{NUMBER:line})?\)
    # Java Logs
    JAVATHREAD (?:[A-Z]{2}-Processor[\d]+)
    JAVACLASS (?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$]+
    JAVAFILE (?:[A-Za-z0-9_.-]+)
    JAVALOGMESSAGE (.*)
    # MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM
    CATALINA_DATESTAMP %{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM)
    # yyyy-MM-dd HH:mm:ss,SSS ZZZ eg: 2014-01-09 17:32:25,527 -0800
    TOMCAT_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})
    CATALINALOG %{CATALINA_DATESTAMP:timestamp} %{JAVACLASS:class} %{JAVALOGMESSAGE:logmessage}
    # 2014-01-09 20:03:28,269 | ERROR | com.example.service.ExampleService - something compeletely unexpected happened...
    TOMCATLOG %{TOMCAT_DATESTAMP:timestamp} \| %{LOGLEVEL:level} \| %{JAVACLASS:class} - %{JAVALOGMESSAGE:logmessage}
    

日志sample

2017-06-21 16:07:35,995 [INFO ] org.apache.logging.log4j.LogManager:409 - 延时加载 key:rights,耗时:142

上面的配置, 我把TOMCAT_DATESTAMP的timezone去掉了, 因为我们自己的没有时区 logstash 配置文件

匹配失败 tag_on_failure

%{IP:client}                  用 pattern IP(自带的或者自定义的) 匹配到的命名为 client
(?<log_msg>.*)                用 pattern .* 匹配到的命名为log_msg
input {
	redis {
		data_type => "list"
		host => "192.168.1.61"
		port => 6379
		password => "redispwd"
		db => 0
		key => "tomcat-catalina-log"
	}
}

filter {
	grok {
		patterns_dir => ["/root/elk/patterns"]
		match => {
		    "message" => "(?<log_time>(^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})) \[(?<log_level>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\s?\] (?<log_class>(?:[a-zA-Z$_][a-zA-Z$_0-9]*\.*)*[a-zA-Z$_][a-zA-Z$_0-9]*):[0-9]* - (?<log_msg>.*)"
		}
		match => {
			"source" => "/alidata/log/(?<log_from>.*)/catalina\.out"
		}
		break_on_match => false
		remove_field => ["beat"]
	}
	date {
		match => ["log_time", "yyyy-MM-dd HH:mm:ss,SSS"]
		timezone => "Asia/Shanghai"
	}

	if "_grokparsefailure" in [tags] {
		grok {
			match => {
				"message" => "(?<log_class>(?:[a-zA-Z$_][a-zA-Z$_0-9]*\.*)*[a-zA-Z$_][a-zA-Z$_0-9]*)"
			}
			add_field => {
				"log_level" => "ERROR"
				"log_time" => "%{@timestamp}"
				"log_msg" => "%{message}"
			}
		}
	}
}

output {
	elasticsearch {
		hosts => ["127.0.0.1:9200"]
		index => "logstash-%{type}-%{+YYYY.MM.dd}"
	}
}

grok 语法测试工具 https://grokdebug.herokuapp.com/

非常遗憾的是, input redis 的 key 不支持正则, https://github.com/logstash-plugins/logstash-input-redis/issues/49

ElasticSearch

./config/elasticsearch.yml

path.data: /root/elk/data/es/data
path.logs: /root/elk/data/es/logs
network.host: 192.168.1.61
http.port: 9200

其他配置项

https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html 包含日志保存几天等

Note

因为es可以执行代码, 所以, 不能以root用户启动, 要新建一个专用的用户

groupadd elsearch
useradd elsearch -g elsearch -p elasticsearch

生产部署指引

https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html 总结一下,就是 1. 大内存(建议不超过64G, 不低于8G) 2. 普通CPU(双核就差不多了) 3. 高速硬盘

kibana

server.port: 5601
server.host: "192.168.1.61"
elasticsearch.url: "http://localhost:9200"

常用的搜索语法

  • 简单的搜索是使用 lucene 的语法 syntax
  • 也可以使用 es 的搜索语法 es-dsl
简单搜索
  1. 直接输入查询关键字key, 返回所有可以索引字段值中包含key的文档
  2. 限定字段全文搜索field:value, 限定字段精确查找field:"value"
  3. 使用正则表达式field:/regex/
  4. 逻辑操作符AND, OR +: 搜索结果包含, -: 搜索结果不含
  5. 特殊字符 + - && || ! () {} [] ^" ~ * ? : \
使用es-dsl搜索

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-query-string-query.html#query-string-syntax

查看上下文

以前kibana是不支持查看上下文的, 在某一个版本中加上了 kibana-9198

  • 先检索到具体的某一条
  • 展开详情 此处应该有图
  • View surrounding documents 此处应该有图

参考资料

https://www.elastic.co/guide/en/beats/filebeat/current/index.html https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html https://www.elastic.co/guide/en/logstash/current/index.html https://www.elastic.co/guide/en/kibana/current/index.html https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html https://lucene.apache.org/core/2_9_4/queryparsersyntax.html