如何可視化保存的統計神器？

Question

我知道有兩種運行 TFX 管道的方法。 首先，在瀏覽器中使用帶有InteractiveContext的 Jupyter notebook：

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext


context = InteractiveContext(pipeline_root=_pipeline_data_folder)

example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
context.run(example_gen, enable_cache=True)

statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen, enable_cache=True)

context.show(statistics_gen.outputs['statistics'])

這樣，我就可以在瀏覽器中看到統計神器了：

運行相同管道的第二種方法是使用 python 腳本（不涉及瀏覽器）：

example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])

components = [
    example_gen,
    statistics_gen,
]

pipeline = tfx.dsl.Pipeline(
    pipeline_name='sample_pipeline',
    pipeline_root=_pipeline_data_folder,
    metadata_connection_config=tfx.orchestration.metadata.sqlite_metadata_connection_config(
        f'{_pipeline_data_folder}/metadata.db'),
    components=components)

tfx.orchestration.LocalDagRunner().run(pipeline)

我知道由於第二種方法不涉及瀏覽器，因此要求可視化是沒有意義的。 但是在第一種方法中創建的相同工件也在第二種方法中創建。 所以我的問題是，第二個管道結束后，如何可視化創建的統計工件？

Answer 1

我花了一整天的時間才弄清楚這一點，我不得不為此閱讀 TFX 代碼（幾乎沒有任何文檔）。 在TFX 文檔中可以找到解決相同需求的舊方法，但它已過時並且不適用於最新版本的 TFX。 我敢肯定，即使這個解決方案也將是短暫的，很快就會失效。 但暫時：

from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()


sqlite_path = './pipeline_data/metadata.db'
pipeline_name = 'simple_pipeline'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)

with Metadata(metadata_connection_config) as metadata:
    context = metadata.store.get_context_by_type_and_name('node', f'{pipeline_name}.{component_name}')
    artifacts = metadata.store.get_artifacts_by_context(context.id)
    artifact_type = metadata.store.get_artifact_type(type_name)
    latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
    artifact = types.Artifact(artifact_type)
    artifact.set_mlmd_artifact(latest_artifact)
    visualization = visualizations.get_registry().get_visualization(artifact.type_name)
    visualization.display(artifact)

免責聲明，此代碼顯示特定管道的統計組件的最新工件。 或者，如果您願意，可以通過文件夾路徑 (uri) 指向工件：

from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()

sqlite_path = './pipeline_data/metadata.db'
uri = './pipeline_data/StatisticsGen/statistics/16'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)

with Metadata(metadata_connection_config) as metadata:
    artifacts = metadata.store.get_artifacts_by_uri(uri)
    artifact_type = metadata.store.get_artifact_type(type_name)
    latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
    artifact = types.Artifact(artifact_type)
    artifact.set_mlmd_artifact(latest_artifact)
    visualization = visualizations.get_registry().get_visualization(type_name)
    visualization.display(artifact)

最后，也許有更好的方法可以做到這一點，但我錯過了。

如何可視化保存的統計神器？

問題描述

1 個解決方案

解決方案1
0 2023-01-04 01:55:00

如何可視化保存的統計神器？

問題描述

1 個解決方案

解決方案1 0 2023-01-04 01:55:00

解決方案1
0 2023-01-04 01:55:00