简体   繁体   English

nifi + executescript使用python计算文件中的行

[英]nifi + executescript count lines in file using python

Hi i need to get the numbers of line in a csv file exclusive the first line which are headers I need to store the number of lines in a attribute and parse the flowfile untouched to the next processor 嗨,我需要获取csv文件中的行数,而第一行是标头,我需要将行数存储在属性中,并解析未触及下一个处理器的流文件

I was thinking of using extracttext but i don't think that a regular expression can do this. 我当时在考虑使用extracttext,但是我不认为正则表达式可以做到这一点。

So next step would be a executeScript processor. 因此,下一步将是executeScript处理器。 I was think of a python script with following template 我想到了带有以下模板的python脚本

flowFile = session.get() 
if (flowFile != None):
# All processing code starts at this indent
attrMap = ['numberOflines': '1', 'myAttr2': Integer.toString(2)]
flowFile = session.get()
if(!flowFile) return
#Do something to get numbers of lines in the flow file
i =0;
    for line in flowfile
        i+=1

flowFile = session.putAttribute(flowFile, 'attribute_numberOfLines', i)
if errorOccurred:
    session.transfer(flowFile, REL_FAILURE)
else:
    session.transfer(flowFile, REL_SUCCESS)

implicit return at the end 最后的隐式收益

This will not run 这将无法运行

Try a SplitText processor with the Line Split Count set to some number higher than the largest number of possible lines in your files (such as 1 million). 尝试将“行拆分计数”设置为比文件中可能的最大行数(例如一百万)高一些的SplitText处理器。 You can also set the Header Line Count to 1 if you want the total number of lines minus the header. 如果希望总行数减去标题,也可以将“标题行数”设置为1。 You'll get the same flow file(s) out, but with an attribute text.line.count that will contain the number of lines. 您将获得相同的流文件,但具有将包含行数的属性text.line.count。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM