简体   繁体   中英

Count consecutive occurrences of values in a .txt file

I have a .txt file that has two words repeating in separate lines.

Here is an example. (the actual one is about 80,000 lines long)

ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS

I am trying to develop some Python code to count the consecutive lines and return the number of times they repeat. So for this example I would like to return [3,4,5] to another .txt file

word="100011010"
count=1
length=""

for i in range(1, len(word)):

    if word[i-1] == word[i]:
       count += 1

    else:
        length += word[i-1]+" repeats "+str(count)+", "
        count=1

length += ("and "+word[i]+" repeats "+str(count))
print (length)

The concept is similar to the above code for a string. Is there a way to do this with a list?

You can read the entire file as this:

content = []
with open('/path/to/file.txt', 'r') as file
    content = file.readlines()
    #Maybe you want to strip the lines
    #content = [line.strip() for line in file.readlines()]

Here you have a list with all the lines of the file

def count_consecutive_lines(lines):
    counter = 1
    output = ''
    for index in range(1, len(lines)):
        if lines[index] != lines[index-1]:
            output += '{} repeats {} times.\n'.format(lines[index], counter)
            counter = 1
        counter += 1
   return output

And call this like

print(count_consecutive_lines(content))

An answer that doesn't load the whole file into memory:

last = None
count = 0
result = []

with open('sample.txt', 'rb') as f:
    for line in f:
        line = line.strip()
        if line == last:
            count = count + 1
        else:
            if count > 0:
                result.append(count)
            count = 1
            last = line

    result.append(count)
    print result

Result:

[3, 4, 5]

UPDATE

The list contains integers, you can only join strings, so you will have to convert it.

outFile.write('\n'.join(str(n) for n in result))

You can try to convert the file data into a list and follow the approach given below:

with open("./sample.txt", 'r') as fl:
    fl_list = list(fl)
    unique_data = set(fl_list)
    for unique in unique_data:
        print "%s - count: %s" %(unique, fl_list.count(unique))

#output:
ANS - count: 8
AUT - count: 4

Open your file and read it to count:

l=[]
last=''
with open('data.txt', 'r') as f:
    data = f.readlines()

    for line in data:
        words = line.split()
        if words[0]==last:
            l[-1]=l[-1]+1
            last=words[0]
        else:
            l.append(1)
        if last=='':
            last=words[0]

Here is your expected output :)

with open("./sample.txt", 'r') as fl:
    word = list(fl)
    count=1
    length=[]
    for i in range(1, len(word)):
        if word[i-1] == word[i]:
           count += 1
        else:
            length.append(count)
            count=1
    length.append(count)
    print (length)

#output as you excpect:
[3, 4, 5]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM