簡體   English   中英

如何可視化保存的統計神器?

[英]How to visualize a saved statistics artifact?

我知道有兩種運行 TFX 管道的方法。 首先,在瀏覽器中使用帶有InteractiveContext的 Jupyter notebook:

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext


context = InteractiveContext(pipeline_root=_pipeline_data_folder)

example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
context.run(example_gen, enable_cache=True)

statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen, enable_cache=True)

context.show(statistics_gen.outputs['statistics'])

這樣,我就可以在瀏覽器中看到統計神器了:

在此處輸入圖像描述

運行相同管道的第二種方法是使用 python 腳本(不涉及瀏覽器):

example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])

components = [
    example_gen,
    statistics_gen,
]

pipeline = tfx.dsl.Pipeline(
    pipeline_name='sample_pipeline',
    pipeline_root=_pipeline_data_folder,
    metadata_connection_config=tfx.orchestration.metadata.sqlite_metadata_connection_config(
        f'{_pipeline_data_folder}/metadata.db'),
    components=components)

tfx.orchestration.LocalDagRunner().run(pipeline)

我知道由於第二種方法不涉及瀏覽器,因此要求可視化是沒有意義的。 但是在第一種方法中創建的相同工件也在第二種方法中創建。 所以我的問題是,第二個管道結束后,如何可視化創建的統計工件?

我花了一整天的時間才弄清楚這一點,我不得不為此閱讀 TFX 代碼(幾乎沒有任何文檔)。 TFX 文檔中可以找到解決相同需求的舊方法,但它已過時並且不適用於最新版本的 TFX。 我敢肯定,即使這個解決方案也將是短暫的,很快就會失效。 但暫時:

from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()


sqlite_path = './pipeline_data/metadata.db'
pipeline_name = 'simple_pipeline'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)

with Metadata(metadata_connection_config) as metadata:
    context = metadata.store.get_context_by_type_and_name('node', f'{pipeline_name}.{component_name}')
    artifacts = metadata.store.get_artifacts_by_context(context.id)
    artifact_type = metadata.store.get_artifact_type(type_name)
    latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
    artifact = types.Artifact(artifact_type)
    artifact.set_mlmd_artifact(latest_artifact)
    visualization = visualizations.get_registry().get_visualization(artifact.type_name)
    visualization.display(artifact)

免責聲明,此代碼顯示特定管道的統計組件的最新工件。 或者,如果您願意,可以通過文件夾路徑 (uri) 指向工件:

from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()

sqlite_path = './pipeline_data/metadata.db'
uri = './pipeline_data/StatisticsGen/statistics/16'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)

with Metadata(metadata_connection_config) as metadata:
    artifacts = metadata.store.get_artifacts_by_uri(uri)
    artifact_type = metadata.store.get_artifact_type(type_name)
    latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
    artifact = types.Artifact(artifact_type)
    artifact.set_mlmd_artifact(latest_artifact)
    visualization = visualizations.get_registry().get_visualization(type_name)
    visualization.display(artifact)

最后,也許有更好的方法可以做到這一點,但我錯過了。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM