简体   繁体   中英

Python Class Method Not Used?

Looking at this block of code below, I don't see the expand method ever called downstream.

class ReadWordsFromText(beam.PTransform):
    
    def __init__(self, file_pattern):
        self._file_pattern = file_pattern
    
    def expand(self, pcoll):
        return (pcoll.pipeline
                | beam.io.ReadFromText(self._file_pattern)
                | beam.FlatMap(lambda line: re.findall(r'[\w\']+', line.strip(), re.UNICODE)))
    
p = beam.Pipeline(InteractiveRunner())

words = p | 'read' >> ReadWordsFromText('gs://apache-beam-samples/shakespeare/kinglear.txt')

counts = (words 
          | 'count' >> beam.combiners.Count.PerElement())

lower_counts = (words
                | "lower" >> beam.Map(lambda word: word.lower())
                | "lower_count" >> beam.combiners.Count.PerElement())

Does it automatically get triggered when the words instance is created? (I'm trying to understand Python in general in the context of Apache Beam in this case)

words = p | 'read' >> ReadWordsFromText('gs://apache-beam-samples/shakespeare/kinglear.txt')

This class inherits from the beam.PTransform class, which is not shown.

In order for it to support custom operators in the flavor they are used is in the mess bellow, it have to do a lot of things. The only methods called by the language itself are enclosed in a __ prefix and suffix, such as methods used to implement the behavior with operators such as |( __or__ ) and >> ( __rshift__ ) (see the complete magic method listing in the Data Model document). That superclass have to implement these, and any of them might be calling the expand method.

A transform's expand() method should be called as soon as one writes input | transform input | transform . I am surprised you're not seeing it get called when writing words = p | 'read' >> ReadWordsFromText(...) words = p | 'read' >> ReadWordsFromText(...) (it does for me; try printing something out in that method).

Note, however, that actually executing the pipeline (eg doing the Read and the subsequent FlatMap) on the resulting data) is deferred. You need to do p.run() or, if you're trying to use things interactively, interactive_beam.collect() to trigger execution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM