简体   繁体   中英

How to perform this sort operation in python

I am creating a module to analyse frequencies of patterns of tokens and delimiters in a given text split up into sentences.

I have a class "SequencePattern" which identifies one element (token or delimiter) in a set of tokenised sentences, where each SequencePattern has a list attribute "occurrences" consisting of tuples ( n_sentence, n_element ) where this particular element actual occurs. Class SequencePattern has a class-level field, seq_patterns (a set ), where all the individual SequencePattern instances are stored.

At this stage in the processing I only have single-element SequencePatterns, and have weeded out all such SequencePatterns having < 2 occurrences. But SequencePattern is a subclass of tuple and the idea is now to find the "two element" SequencePatterns.

The next thing I need to do is to go through all the one-element SequencePatterns which remain after weeding, identifying spots where you find two (or more) adjacent occurrences in the same sentence, ie where n_sentence is the same and n_element differs by 1.

So I need to do something along these lines:

occurrences_by_text_order = sorted( SequencePattern.seq_patterns.occurrences )

... but of course this doesn't work: I get

AttributeError: 'set' object has no attribute 'occurences'

Somehow I need to do an iteration of all SequencePatterns in seq_patterns and then, for each, a "nested" iteration of all occurrences for each of these... and I need to submit this mass of delivered tuples ( n_sentence, n_element ) to the sorted function.

I'm not an experienced Pythonista but I have a suspicion this is a job for a generator (?). Can anyone help?

def get_occurrences():
    for seq_patt in SequencePattern.seq_patterns:
        for occurrence in seq_patt.occurrences:
            yield occurrence
occurrences_by_text_order = sorted( get_occurrences() ) 

The following then prints out a list of all the two-element sequences which may occur more than once (we now know that there is no possibility of two-element sequences with frequency > 1 occurring anywhere else):

prev_occurrence = None
for occurrence in sorted( occurrence for seq_patt in SequencePattern.seq_patterns for occurrence in seq_patt.occurrences ):
    if prev_occurrence and ( occurrence[ 0 ] == prev_occurrence[ 0 ] ) and ( occurrence[ 1 ] - prev_occurrence[ 1 ] == 1 ):  
        print( '# prev_occurrence %s occurrence: %s' % ( prev_occurrence, occurrence, ))
    prev_occurrence = occurrence

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM