如何使用Spark映射Java中的日志文件？

Question

i have to monitor a log file in which is written the history of utilization of an app. 我必须监视一个日志文件，其中写入了应用程序的使用历史。 This log file is formatted in this way: 该日志文件的格式如下：

<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
... about 800000 rows

AppId is always the same, because is referenced at only one app, date is expressed in this format dd/mm/yyyy hh/mm cpuUsage and memoryUsage are expressed in % so for example: AppId始终相同，因为仅在一个应用程序中引用，所以date以dd/mm/yyyy hh/mm cpuUsage表示cpuUsage和memoryUsage以%表示，例如：

<3ghffh3t482age20304,230720142245,0.2,3,5>

So,in the specific i have to check the percentage of CPU usage and memory usage by this application to be monitored using spark and the map reduce algorithm. 因此，具体来说， 我必须检查此应用程序将使用spark和map reduce算法监视的CPU使用率和内存使用率的百分比 。

my output is to print alert when the cpu or the memory are 100% of usage 我的输出是在CPU或内存使用率达到100％时打印警报

Any suggestion on how i can start? 关于我如何开始的任何建议？

thanks in advance. 提前致谢。

Answer 1

The idea is to declare a class and map the line into a scala object, 这个想法是声明一个类并将该行映射到一个scala对象中，

Lets declare the case class as follows, 让我们声明案例类如下，

case class App(name: String, date: String, cpuUsage: Double, memoryusage: Double)

Then initialize the SparkContext and create a RDD from the text file where the data is present, 然后初始化SparkContext并从存在数据的文本文件中创建一个RDD，

val sc = new SparkContext(sparkConf)
val inFile = sc.textFile("log.txt")

then parse each line and map it to App object so that the range checking would be faster, 然后解析每行并将其映射到App对象，以便范围检查会更快，

val mappedLines = inFile.map(x => (x.split(",")(0), parse(x)))

where the parse(x) method is defined as follows, 其中parse（x）方法的定义如下：

 def parse(x: String):App = {
   val splitArr = x.split(",");
   val app = new App(splitArr(0),
                      splitArr(1),
                      splitArr(2).toDouble,
                      splitArr(3).toDouble)
   return app
}

Note that i have assumed the input as follows, (this is just to give you the idea and not the entire program), 请注意，我假设输入如下，（这只是为了给您提供想法，而不是整个程序），

ffh3t482age20304,230720142245,0.2,100.5

Then do the filter transformation where you can perform the check and report the anamoly conditions, 然后进行过滤器转换，您可以在其中执行检查并报告异常状况，

val anamolyLines = mappedLines.filter(doCheckCPUAndMemoryUtilization)
anamolyLines.count()

where doCheckCPUAndMemoryUtilization function is defined as follows, doCheckCPUAndMemoryUtilization函数的定义如下，

def doCheckCPUAndMemoryUtilization(x:(String, App)):Boolean = {
    if(x._2.cpuUsage >= 100.0 ||
       x._2.memoryusage >= 100.0) {
       System.out.println("App name -> "+x._2.name +" exceed the limit")
       return true
    }

    return false
}

Note: This is only a batch processing and not real-time processing. 注意：这仅是批处理，不是实时处理。

如何使用Spark映射Java中的日志文件？

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-09 11:59:10

如何使用Spark映射Java中的日志文件？

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-09 11:59:10

解决方案1
1 已采纳 2015-06-09 11:59:10