简体   繁体   English

每当pyspark中的一行出现任何错误词时,我如何获取文件中的下一行?

[英]How do i get the next lines in a file whenever any ERROR word come in a line in pyspark?

I have a log file in which i need to check on each line.我有一个日志文件,我需要在其中检查每一行。 Whenever "ERROR" word come in any line then i need to take the next two line after that line.每当“错误”字出现在任何一行时,我都需要在该行之后取下两行。 I have to do this in pyspark.我必须在 pyspark 中执行此操作。

for example: Input log File:例如:输入日志文件:

line 1 1号线

line 2 2号线

line...ERROR... 3行...错误... 3

line 4 4号线

line 5 5号线

line 6 6号线

Output will be :输出将是:

line 4 4号线

line 5 5号线

I have created an rdd using the log file and using map() to traverse each line but i am not getting the exact idea.我已经使用日志文件创建了一个 rdd,并使用 map() 来遍历每一行,但我没有得到确切的想法。

Thanks in advance.提前致谢。

what about something like:怎么样:

# open your file as f
lines = f.readlines()
for i, line in enumerate(lines):
    if "ERROR" in line:
        print(lines[i+1])
        print(lines[i+2])
        # Exit or something you want to do.

Here is a method using windowing functions:这是使用窗口函数的方法:

from pyspark.sql import functions as F
from pyspark.sql.window import Window

# set up DF
df = sc.parallelize([["line1"], ["line2"], ["line3..ERROR"], ["line4"], ["line5"]]).toDF(['col'])

# create an indicator that created a boundary between consecutive errors
win1 = Window.orderBy('col')
df = df.withColumn('hit_error', F.expr("case when col like '%ERROR%' then 1 else 0 end"))
df = df.withColumn('cum_error', F.sum('hit_error').over(win1))

# now count the lines between each error occurrence
win2 = Window.partitionBy('cum_error').orderBy('col')
df = df.withColumn('rownum', F.row_number().over(win2))

# the lines we want are rows 2,3
df.filter("cum_error>0 and rownum in (2,3)").select("col").show(10)```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python 3 中的文本文件中获取行和下一行 - How to get line and next lines from a text file in Python 3 找到一行时如何从文件中获取下 n 行 - How to get next n lines from in a file when a line is found 运行此文件时,为什么会出现doctest错误? - How come I get a doctest error when I run this file? 每当单击下一步按钮时,如何阻止我的刮板发生错误? - How do I stop my scraper from hitting an error whenever it clicks the next button? 我如何使该程序将具有给定单词的每一行写入文件? - how do i get this program to write each line with a given word to a file? 如何遍历文件的每一行并打印出包含彼此相邻的两个元音的任何单词? - How do I loop through each line of a file and print out any words that contain two vowels next to each other? 写到文本文件时,每当我添加到行时,它都会写在下一行 - Writing onto text file, whenever I add to line, it is written on the next line 如何在文本文件中查找一行中单词的频率 - Pyspark - How to find the frequency of a word in a line, in a text file - Pyspark 如何使用关键字在 json 文件中搜索特定行? - How do I search for a certain line in a json file with a key word? 如何将字符串写入文件的下一行? - How do I write a string to the next line in a file?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM