计算文件中前两个“字符串”出现之间的跳转（行数）

Question

我有一个巨大的数据文件，其中包含在定义的行数后重复的特定字符串。

计算前两个“排名”出现之间的跳跃。 例如，文件如下所示：

  1 5 6 8 Rank                     line-start
  2 4 8 5
  7 5 8 6
  5 4 6 4
  1 5 7 4 Rank                     line-end  
  4 8 6 4
  2 4 8 5
  3 6 8 9
  5 4 6 4 Rank

您可以注意到字符串 Rank 每 3 行重复一次。 因此，对于上述示例，块中的行数为 4。 我的问题是如何使用 python readline() 获取行数。

我目前遵循这个：

data = open(filename).readlines()
count = 0
for j in range(len(data)):
  if(data[j].find('Rank') != -1): 
    if count == 0: line1 = j
    count = count +1 
  if(count == 2):
    no_of_lines = j - line1
    break

欢迎任何改进或建议。

Answer 1

不要使用.readlines()时，一个简单的生成器表达式计算与线路Rank是不够的：

count = sum(1 for l in open(filename) if 'Rank' not in l)

'Rank' not in l足以测试字符串中是否不存在字符串'Rank' 。 循环打开的文件就是循环所有的行。 sum()函数会将所有1相加，这些1是为不包含Rank每一行生成的，为您提供其中没有Rank的行数。

如果你需要计算从Rank到Rank的行数，你需要一点itertools.takewhile魔法：

import itertools
with open(filename) as f:
    # skip until we reach `Rank`:
    itertools.takewhile(lambda l: 'Rank' not in l, f)
    # takewhile will have read a line with `Rank` now
    # count the lines *without* `Rank` between them
    count = sum(1 for l in itertools.takewhile(lambda l: 'Rank' not in l, f)
    count += 1  # we skipped at least one `Rank` line.

Answer 2

计算前两个'Rank'出现之间的跳转：

def find_jumps(filename):
    first = True
    count = 0
    with open(filename) as f:
        for line in f:
            if 'Rank' in line:
                if first:
                    count = 0 
                    #set this to 1 if you want to include one of the 'Rank' lines.
                    first = False                    
                else:
                    return count
            else:
                count += 1

Answer 3

7行代码：

count = 0
for line in open("yourfile.txt"):
    if "Rank" in line: 
        count += 1
        if count > 1: break 
    elif count > 0: count += 1
print count

Answer 4

我假设您要查找块中的行数，其中每个块以包含“等级”的行开头，例如，您的示例中有 3 个块：第 1 个有 4 行，第 2 个有 4 行，第 3 个有 1 行：

from itertools import groupby

def block_start(line, start=[None]):
    if 'Rank' in line:
       start[0] = not start[0]
    return start[0]

with open(filename) as file:
     block_sizes = [sum(1 for line in block) # find number of lines in a block
                    for _, block in groupby(file, key=block_start)] # group
print(block_sizes)
# -> [4, 4, 1]

如果所有块的行数相同，或者您只想查找以'Rank'开头的第一个块中的行数：

count = None
with open(filename) as file:
     for line in file:
         if 'Rank' in line:
             if count is None: # found the start of the 1st block
                count = 1
             else: # found the start of the 2nd block
                break
         elif count is not None: # inside the 1st block
             count += 1
print(count) # -> 4

计算文件中前两个“字符串”出现之间的跳转（行数）

问题描述

4 个解决方案

解决方案1
4 2012-12-03 09:59:23

解决方案2
2 2012-12-03 10:12:45

解决方案3
1 2012-12-03 10:25:02

解决方案4
1 已采纳 2012-12-03 10:25:20

计算文件中前两个“字符串”出现之间的跳转（行数）

问题描述

4 个解决方案

解决方案1 4 2012-12-03 09:59:23

解决方案2 2 2012-12-03 10:12:45

解决方案3 1 2012-12-03 10:25:02

解决方案4 1 已采纳 2012-12-03 10:25:20

解决方案1
4 2012-12-03 09:59:23

解决方案2
2 2012-12-03 10:12:45

解决方案3
1 2012-12-03 10:25:02

解决方案4
1 已采纳 2012-12-03 10:25:20