需要在另一个文件中查找行模式并使用 python 获取特定列中相应值的总和

Question

I have 2 files where file 1 has the below lines and file 2 has the following lines with some million records.我有 2 个文件，其中文件 1 具有以下行，文件 2 具有以下行，其中包含数百万条记录。 Now I want to search file 1 entries in file 2 and generate the report with sum of 2nd column and the corresponding line next to each other in new file.现在我想在文件 2 中搜索文件 1 条目，并在新文件中生成具有第 2 列和相应行的总和的报告。

File 1 entries:文件 1 条目：

/dataset1
/dataset2

File 2 entries:文件 2 条目：

12 5 /opt/dataset1
 6 0 /opt/dataset2
 5 8 /dataset1

Looking for sum of 2nd column values with pattern next to each other寻找模式相邻的第二列值的总和

13 /dataset1
 0 /datase2

thank you CS谢谢CS

Answer 1

I would first process File 1 and create a regex with the following format:我将首先处理文件 1 并创建一个具有以下格式的正则表达式：

\d\s+(\d)\s+\S*(\/dataset1|\/dataset2)

After creating the regex, just use re.findall to find all the relevant information, and sum all the matches.创建正则表达式后，只需使用re.findall查找所有相关信息，并对所有匹配项求和。 It should be easy...应该很容易...

Of course, the regex doesn't have a fixed format, you would need to generate it according to the lines of the first file.当然，正则表达式没有固定的格式，你需要根据第一个文件的行来生成它。 Something like that:像这样的东西：

def generate_regex(file1_lines):
    regex = "\d\s+(\d)\s+\S*("
    for line in file1_lines:
        line = line.replace(r"/", r"\/")
        regex += line.strip() + "|"
    regex = regex[:-1] + ")"
    return regex

需要在另一个文件中查找行模式并使用 python 获取特定列中相应值的总和

问题描述

1 个解决方案

解决方案1
0 2022-01-31 20:28:54

需要在另一个文件中查找行模式并使用 python 获取特定列中相应值的总和

问题描述

1 个解决方案

解决方案1 0 2022-01-31 20:28:54

解决方案1
0 2022-01-31 20:28:54