![](/img/trans.png)
[英]Displaying TensorflowExtended/TFX results of visualize_statistics
[英]How to visualize a saved statistics artifact?
我知道有兩種運行 TFX 管道的方法。 首先,在瀏覽器中使用帶有InteractiveContext
的 Jupyter notebook:
from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
context = InteractiveContext(pipeline_root=_pipeline_data_folder)
example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
context.run(example_gen, enable_cache=True)
statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen, enable_cache=True)
context.show(statistics_gen.outputs['statistics'])
這樣,我就可以在瀏覽器中看到統計神器了:
運行相同管道的第二種方法是使用 python 腳本(不涉及瀏覽器):
example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
components = [
example_gen,
statistics_gen,
]
pipeline = tfx.dsl.Pipeline(
pipeline_name='sample_pipeline',
pipeline_root=_pipeline_data_folder,
metadata_connection_config=tfx.orchestration.metadata.sqlite_metadata_connection_config(
f'{_pipeline_data_folder}/metadata.db'),
components=components)
tfx.orchestration.LocalDagRunner().run(pipeline)
我知道由於第二種方法不涉及瀏覽器,因此要求可視化是沒有意義的。 但是在第一種方法中創建的相同工件也在第二種方法中創建。 所以我的問題是,第二個管道結束后,如何可視化創建的統計工件?
我花了一整天的時間才弄清楚這一點,我不得不為此閱讀 TFX 代碼(幾乎沒有任何文檔)。 在TFX 文檔中可以找到解決相同需求的舊方法,但它已過時並且不適用於最新版本的 TFX。 我敢肯定,即使這個解決方案也將是短暫的,很快就會失效。 但暫時:
from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()
sqlite_path = './pipeline_data/metadata.db'
pipeline_name = 'simple_pipeline'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)
with Metadata(metadata_connection_config) as metadata:
context = metadata.store.get_context_by_type_and_name('node', f'{pipeline_name}.{component_name}')
artifacts = metadata.store.get_artifacts_by_context(context.id)
artifact_type = metadata.store.get_artifact_type(type_name)
latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
artifact = types.Artifact(artifact_type)
artifact.set_mlmd_artifact(latest_artifact)
visualization = visualizations.get_registry().get_visualization(artifact.type_name)
visualization.display(artifact)
免責聲明,此代碼顯示特定管道的統計組件的最新工件。 或者,如果您願意,可以通過文件夾路徑 (uri) 指向工件:
from tfx import types
from tfx import v1 as tfx
from tfx.orchestration.metadata import Metadata
from tfx.orchestration.experimental.interactive import visualizations
from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()
sqlite_path = './pipeline_data/metadata.db'
uri = './pipeline_data/StatisticsGen/statistics/16'
component_name = 'StatisticsGen'
type_name = 'ExampleStatistics'
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(sqlite_path)
with Metadata(metadata_connection_config) as metadata:
artifacts = metadata.store.get_artifacts_by_uri(uri)
artifact_type = metadata.store.get_artifact_type(type_name)
latest_artifact = max([a for a in artifacts if a.type_id == artifact_type.id], key=lambda a: a.last_update_time_since_epoch)
artifact = types.Artifact(artifact_type)
artifact.set_mlmd_artifact(latest_artifact)
visualization = visualizations.get_registry().get_visualization(type_name)
visualization.display(artifact)
最后,也許有更好的方法可以做到這一點,但我錯過了。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.