简体   繁体   English

流式传输远程日志文件的有效方法

[英]Efficient way to stream a remote log file

I'm looking for a way to read a fast growing logfile on a remote unix host. 我正在寻找一种在远程UNIX主机上读取快速增长的日志文件的方法。
The logfile occasionally gets a logswitch (eg starts from 0 bytes again). 日志文件偶尔会获取一个日志切换(例如,再次从0字节开始)。 The reason why I can't process the logfile directly on the remote host is that the processor puts too much load on the host which must not happen. 我之所以不能直接在远程主机上处理日志文件,是因为处理器在主机上施加了过多的负载,这种负载一定不会发生。 So I need to have the processing and the reading on two different hosts. 所以我需要在两个不同的主机上进行处理和读取。

Since I'm not at home in the Java world I'd like to ask for advice how this can best be achieved. 由于我在Java世界中并不在家,所以我想请教如何最好地实现这一点。

My thoughts so far: 到目前为止,我的想法是:
Have the local logfile processor (localhost) scp a logfilereader (java binary) to the remote host and start it (via an ssh connection started by the local logfile processor). 让本地日志文件处理器(localhost)将日志文件读取器(java二进制文件)scp到远程主机并启动它(通过由本地日志文件处理器启动的ssh连接)。 The logfilereader then starts reading/tailing the logfile and serves it as a TCP stream (which can then be read by the local logfile processor). 然后,日志文件读取器开始读取/定序日志文件,并将其用作TCP流(然后可以由本地日志文件处理器读取)。

I'm pretty sure there are more elegant javastyle approaches. 我很确定有更多优雅的javastyle方法。 Thanks for any hints. 感谢您的任何提示。

If you can run ssh on your remote host, then you could use 如果可以在远程主机上运行ssh,则可以使用

ssh <remote host> "tail -f <remote log file name>" > <local log file name>

Which will redirect anything written to the remote log file name to the local file name. 这会将写入远程日志文件名的所有内容重定向到本地文件名。 If the remote file gets erased, you get a message saying that the remote file was truncated. 如果删除了远程文件,则会收到一条消息,指出该远程文件已被截断。

If you need to read the log file online (ie as the messages come in), I suggest to examine ways to offer the messages via TCP instead (or in addition) to writing them into a file. 如果您需要在线阅读日志文件(即,随着消息的到来),我建议研究一下通过TCP代替(或另外)将消息写入文件的方式。

If the remote app uses a logging framework, then this is usually just a few lines in the configuration. 如果远程应用程序使用日志记录框架,则配置中通常只有几行。

This will also reduce load on the remote host since it doesn't have to write any data to disk anymore. 这也将减少远程主机上的负载,因为它不再需要将任何数据写入磁盘。 But that's usually only a problem when the remote process accesses the disk a lot to do it's work. 但这通常只是远程进程大量访问磁盘以完成其工作时的问题。 If the remote process talks a lot with a database, this can be counterproductive since the log messages will compete with the DB queries for network resources. 如果远程进程与数据库进行了大量对话,这可能适得其反,因为日志消息将与DB查询争用网络资源。

On the positive side, this makes it easier to be sure you process each log message at most once (you might lose some if your local listener is restarted). 从积极的一面来看,这样可以更轻松地确保最多处理一次每个日志消息(如果重新启动本地侦听器,则可能会丢失一些消息)。

If that's not possible, run tail -f <logfile> via ssh (as Vicent suggested in the other answer). 如果那不可能,请通过ssh运行tail -f <logfile> (如其他答案中的Vicent所建议)。 See this question for SSH libraries for Java if you don't want to use ProcessBuilder . 如果您不想使用ProcessBuilder 请参阅关于Java SSH库的问题

When you read the files, the hard tasks is to make sure that you process each log message exactly once (ie you don't miss any and that you don't process them twice). 读取文件时,艰巨的任务是确保每个日志消息仅处理一次(即,您不会丢失任何消息,并且不会处理两次)。 Depending on how the log rotation works and how your remote process creates log files, you might lose a couple of messages when they are switched. 根据日志轮换的工作方式以及远程进程如何创建日志文件,切换它们时,您可能会丢失几则消息。

If you don't need online processing (ie seeing yesterdays messages is enough), try rsync to copy the remote folder. 如果您不需要在线处理(即查看昨天的消息就足够了),请尝试rsync复制远程文件夹。 rsync is very good at avoiding duplicate transfers and it works over ssh . rsync非常擅长避免重复传输,并且可以在ssh That will give you a local copy of all log files which you can process. 这将为您提供可以处理的所有日志文件的本地副本。 Of course, rsync is too expensive to handle the active log file, so that's the file which you can't examine (hence the limitation that this is only possible if you don't need online processing). 当然, rsync太昂贵了,无法处理活动日志文件,因此这是您无法检查的文件(因此,这种局限性在于只有在不需要在线处理时才可以这样做 )。

One final tip: Try to avoid transmitting useless log messages. 最后一条提示:尝试避免传输无用的日志消息。 It's often possible to reduce the load many times by filtering the log files with a very simple script before your transfer it. 通常可以通过在传输之前用非常简单的脚本过滤日志文件来减少许多次负载。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM