ELK
ELK 搭建与使用 (5.4)
ELK 是什么?
ElasticSearch + logstash + kibana 是一套开源的日志管理检索方案
我采用的采集流程
+-----------+
| |
| filebeat1 +------+
| | | +-------+ +-------------+ +-----------------+ +----------+
+-----------+ | | | | | | | | |
+--> | redis +--> | logstash +---> | ElasticSearch +--->+ kibana |
+-----------+ | | | | | | | | |
| | | +-------+ +-------------+ +-----------------+ +----------+
| filebeat2 +------+
| |
+-----------+
RPUSH BLPOP
filebeat 采集
一般, filebeat只做简单的采集, 与简单的聚合, 例如多行聚合, 剩下的数据筛选之类的, 都交给后面一层专业的 logstash 处理.
config
filebeat.prospectors:
- input_type: log
paths:
- /alidata/log/tomcat-*/catalina.out
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} \['
multiline.negate: true
multiline.match: after
output.redis:
hosts: "192.168.1.61:6379"
password: "redispwd"
key: "tomcat-catalina-log"
db: 0
timeout: 5
ref: redis-output
redis
保存在redis上是以队列的形式, 也就是list, 用命令查出来是这样的 lpop tomcat-catalina-log
{
"@timestamp": "2017-06-21T03:40:42.312Z",
"beat": {
"hostname": "front-end",
"name": "front-end",
"version": "5.4.2"
},
"input_type": "log",
"message": "2017-06-21 11:40:33,572 [ERROR] xxx.xxxxxxx.Logger:87 - org.springframework.dao.DataIntegrityViolationException: SqlMapClient operation; SQL []; Column 'FID' cannot be null; nested exception is java.sql.BatchUpdateException: Column 'FID' cannot be null\\n\\tat org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:102)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)\\n\\tat org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)\\n\\tat org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:204)\\n\\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\\n\\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\\n\\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\\n\\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\\n\\tat java.lang.Thread.run(Thread.java:745)\\n",
"offset": 29117034,
"source": "/alidata/log/tomcat-xxx/catalina.out",
"type": "log"
}
logstash
- 首先, 不要尝试自己写正则了, java日志与tomcat日志都是很常见的
导入配置 Github-patterns
JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]* #Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source' JAVAFILE (?:[A-Za-z0-9_. -]+) #Allow special <init>, <clinit> methods JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*) #Line number is optional in special cases 'Native method' or 'Unknown source' JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:class}\.%{JAVAMETHOD:method}\(%{JAVAFILE:file}(?::%{NUMBER:line})?\) # Java Logs JAVATHREAD (?:[A-Z]{2}-Processor[\d]+) JAVACLASS (?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$]+ JAVAFILE (?:[A-Za-z0-9_.-]+) JAVALOGMESSAGE (.*) # MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM CATALINA_DATESTAMP %{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM) # yyyy-MM-dd HH:mm:ss,SSS ZZZ eg: 2014-01-09 17:32:25,527 -0800 TOMCAT_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) CATALINALOG %{CATALINA_DATESTAMP:timestamp} %{JAVACLASS:class} %{JAVALOGMESSAGE:logmessage} # 2014-01-09 20:03:28,269 | ERROR | com.example.service.ExampleService - something compeletely unexpected happened... TOMCATLOG %{TOMCAT_DATESTAMP:timestamp} \| %{LOGLEVEL:level} \| %{JAVACLASS:class} - %{JAVALOGMESSAGE:logmessage}
日志sample
2017-06-21 16:07:35,995 [INFO ] org.apache.logging.log4j.LogManager:409 - 延时加载 key:rights,耗时:142
上面的配置, 我把TOMCAT_DATESTAMP的timezone去掉了, 因为我们自己的没有时区 logstash 配置文件
匹配失败 tag_on_failure
grok 语法 LINK
%{IP:client} 用 pattern IP(自带的或者自定义的) 匹配到的命名为 client
(?<log_msg>.*) 用 pattern .* 匹配到的命名为log_msg
input {
redis {
data_type => "list"
host => "192.168.1.61"
port => 6379
password => "redispwd"
db => 0
key => "tomcat-catalina-log"
}
}
filter {
grok {
patterns_dir => ["/root/elk/patterns"]
match => {
"message" => "(?<log_time>(^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})) \[(?<log_level>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\s?\] (?<log_class>(?:[a-zA-Z$_][a-zA-Z$_0-9]*\.*)*[a-zA-Z$_][a-zA-Z$_0-9]*):[0-9]* - (?<log_msg>.*)"
}
match => {
"source" => "/alidata/log/(?<log_from>.*)/catalina\.out"
}
break_on_match => false
remove_field => ["beat"]
}
date {
match => ["log_time", "yyyy-MM-dd HH:mm:ss,SSS"]
timezone => "Asia/Shanghai"
}
if "_grokparsefailure" in [tags] {
grok {
match => {
"message" => "(?<log_class>(?:[a-zA-Z$_][a-zA-Z$_0-9]*\.*)*[a-zA-Z$_][a-zA-Z$_0-9]*)"
}
add_field => {
"log_level" => "ERROR"
"log_time" => "%{@timestamp}"
"log_msg" => "%{message}"
}
}
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
}
}
grok 语法测试工具 https://grokdebug.herokuapp.com/
非常遗憾的是, input redis 的 key 不支持正则, https://github.com/logstash-plugins/logstash-input-redis/issues/49
ElasticSearch
./config/elasticsearch.yml
path.data: /root/elk/data/es/data
path.logs: /root/elk/data/es/logs
network.host: 192.168.1.61
http.port: 9200
其他配置项
https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html 包含日志保存几天等
Note
因为es可以执行代码, 所以, 不能以root用户启动, 要新建一个专用的用户
groupadd elsearch
useradd elsearch -g elsearch -p elasticsearch
生产部署指引
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html 总结一下,就是 1. 大内存(建议不超过64G, 不低于8G) 2. 普通CPU(双核就差不多了) 3. 高速硬盘
kibana
server.port: 5601
server.host: "192.168.1.61"
elasticsearch.url: "http://localhost:9200"
常用的搜索语法
简单搜索
- 直接输入查询关键字
key
, 返回所有可以索引字段值中包含key的文档 - 限定字段全文搜索
field:value
, 限定字段精确查找field:"value"
- 使用正则表达式
field:/regex/
- 逻辑操作符
AND
,OR
+
: 搜索结果包含,-
: 搜索结果不含 - 特殊字符
+ - && || ! () {} [] ^" ~ * ? : \
使用es-dsl搜索
查看上下文
以前kibana是不支持查看上下文的, 在某一个版本中加上了 kibana-9198
- 先检索到具体的某一条
- 展开详情
此处应该有图 - View surrounding documents
此处应该有图
参考资料
https://www.elastic.co/guide/en/beats/filebeat/current/index.html https://www.elastic.co/guide/en/beats/filebeat/current/redis-output.html https://www.elastic.co/guide/en/logstash/current/index.html https://www.elastic.co/guide/en/kibana/current/index.html https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html https://lucene.apache.org/core/2_9_4/queryparsersyntax.html