简体   繁体   English

需要在另一个文件中查找行模式并使用 python 获取特定列中相应值的总和

[英]Needs to find line pattern in another file and get the sum of corresponding value in particular column using python

I have 2 files where file 1 has the below lines and file 2 has the following lines with some million records.我有 2 个文件,其中文件 1 具有以下行,文件 2 具有以下行,其中包含数百万条记录。 Now I want to search file 1 entries in file 2 and generate the report with sum of 2nd column and the corresponding line next to each other in new file.现在我想在文件 2 中搜索文件 1 条目,并在新文件中生成具有第 2 列和相应行的总和的报告。

File 1 entries:文件 1 条目:

/dataset1
/dataset2

File 2 entries:文件 2 条目:

12 5 /opt/dataset1
 6 0 /opt/dataset2
 5 8 /dataset1

Looking for sum of 2nd column values with pattern next to each other寻找模式相邻的第二列值的总和

13 /dataset1
 0 /datase2

thank you CS谢谢CS

I would first process File 1 and create a regex with the following format:我将首先处理文件 1 并创建一个具有以下格式的正则表达式:

\d\s+(\d)\s+\S*(\/dataset1|\/dataset2)

After creating the regex, just use re.findall to find all the relevant information, and sum all the matches.创建正则表达式后,只需使用re.findall查找所有相关信息,并对所有匹配项求和。 It should be easy...应该很容易...

Of course, the regex doesn't have a fixed format, you would need to generate it according to the lines of the first file.当然,正则表达式没有固定的格式,你需要根据第一个文件的行来生成它。 Something like that:像这样的东西:

def generate_regex(file1_lines):
    regex = "\d\s+(\d)\s+\S*("
    for line in file1_lines:
        line = line.replace(r"/", r"\/")
        regex += line.strip() + "|"
    regex = regex[:-1] + ")"
    return regex

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 python 根据另一列中的相应值对列中的小数进行四舍五入 - Using python to round decimals in a column based on the corresponding value in another column 使用python查找文件的模式和一行 - Find a pattern and line of file using python 查找全部 - 查找数据帧的一列与另一列匹配模式的所有出现并获取相应的值 - find all - to find all occurrence of matching pattern one column of a data frame to other and get the corresponding value 如何使用Python获取另一列中具有相同值的值的总和? - How to get the sum of values with the same value in another column with Python? 更改特定行中的特定值(例如行号:57),并使用python保存具有相同文件名的文件 - Changing a particular value in a particular line (say line number:57) and saving the file with same file name using python 使用python将csv文件中的特定列附加到另一列 - append a particular column from a csv file to another using python 对于每个类别,如何找到另一列的最小值对应的列的值? - For each category, how to find the value of a column corresponding to the minimum of another column? pyspark 查询另一列对应值的列值差异 - pyspark query to find the difference in column value for corresponding values in another column 将相应列与 python 相加 - Total the sum of a corresponding Column with python 使用另一行访问 CSV 文件中一行中的特定值 - python - Accessing a particular value in a row in CSV File using another row - python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM