<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.0.0">Jekyll</generator><link href="https://amygdala.github.io/gcp_blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://amygdala.github.io/gcp_blog/" rel="alternate" type="text/html" /><updated>2021-06-23T11:16:25-05:00</updated><id>https://amygdala.github.io/gcp_blog/feed.xml</id><title type="html">Amy on GCP</title><subtitle>Using Google Cloud Platform</subtitle><entry><title type="html">Use Vertex Pipelines to build an AutoML Classification end-to-end workflow</title><link href="https://amygdala.github.io/gcp_blog/mlops/pipelines/vertex/kfp/2021/06/22/automl_tab_beans.html" rel="alternate" type="text/html" title="Use Vertex Pipelines to build an AutoML Classification end-to-end workflow" /><published>2021-06-22T00:00:00-05:00</published><updated>2021-06-22T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/mlops/pipelines/vertex/kfp/2021/06/22/automl_tab_beans</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/mlops/pipelines/vertex/kfp/2021/06/22/automl_tab_beans.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;This post shows how you can use &lt;a href=&quot;https://cloud.google.com/vertex-ai/docs/pipelines&quot;&gt;Vertex Pipelines&lt;/a&gt; to build an end-to-end ML workflow that trains a custom model using AutoML; evaluates the accuracy of the trained model; and if the model is sufficiently accurate, deploys it to Vertex AI for serving.&lt;/p&gt;

&lt;h3 id=&quot;vertex-ai-and-vertex-pipelines&quot;&gt;Vertex AI and Vertex Pipelines&lt;/h3&gt;

&lt;p&gt;The recently-launched &lt;a href=&quot;https://cloud.google.com/vertex-ai/&quot;&gt;Vertex AI&lt;/a&gt;  is a unified MLOps platform to help data scientists and ML engineers increase their rate of experimentation, deploy models faster, and manage models more effectively.  It brings AutoML and AI Platform together, in conjunction with some new MLOps-focused products, into a unified API, client library, and user interface.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/vertex-ai/docs/pipelines&quot;&gt;Vertex Pipelines&lt;/a&gt; is part of &lt;a href=&quot;https://cloud.google.com/vertex-ai/&quot;&gt;Vertex AI.&lt;/a&gt;  It helps you to automate, monitor, and govern your ML systems by orchestrating your ML workflows.  It is automated, scalable, serverless, and cost-effective: you pay only for what you use.  Vertex Pipelines is the backbone of the Vertex AI ML Ops story, and makes it easy to build and run  ML workflows using any ML framework.  Because it is serverless, and has seamless integration with GCP and Vertex AI tools and services, you can focus on just building and running your pipelines without worrying about infrastructure or cluster maintenance.&lt;/p&gt;

&lt;p&gt;Vertex Pipelines automatically logs metadata to track artifacts, lineage, metrics, and execution across your ML workflows, supports step execution caching, and provides support for enterprise security controls like &lt;a href=&quot;https://cloud.google.com/iam&quot;&gt;Cloud IAM&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/vpc-service-controls&quot;&gt;VPC-SC&lt;/a&gt;, and &lt;a href=&quot;https://cloud.google.com/kms/docs/cmek&quot;&gt;CMEK&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Vertex Pipelines supports two OSS Python SDKs: TFX (&lt;a href=&quot;https://www.tensorflow.org/tfx&quot;&gt;TensorFlow Extended&lt;/a&gt;) and KFP  (&lt;a href=&quot;https://www.kubeflow.org/docs/components/pipelines/&quot;&gt;Kubeflow Pipelines&lt;/a&gt;).  The &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example Vertex pipeline&lt;/a&gt; highlighted in this post uses the KFP SDK, and includes use of the &lt;strong&gt;&lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/components/google-cloud&quot;&gt;Google Cloud Pipeline Components&lt;/a&gt;&lt;/strong&gt;, which support easy access to Vertex AI services. Vertex Pipelines requires v2 of the KFP SDK.   Soon, it will be possible to use the &lt;a href=&quot;https://www.kubeflow.org/docs/components/pipelines/sdk/v2/v2-compatibility/#current-caveats&quot;&gt;KFP v2 ‘compatibility mode’&lt;/a&gt; to run KFP V2 examples like this on OSS KFP as well.&lt;/p&gt;

&lt;h2 id=&quot;an-end-to-end-automl-workflow-with-vertex-pipelines&quot;&gt;An end-to-end AutoML Workflow with Vertex Pipelines&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/vertex-ai/docs/start/automl-model-types#tabular&quot;&gt;Vertex AI’s AutoML Tabular service&lt;/a&gt; lets you bring your own structured data to train a model, without needing to build the model architecture yourself. For this example, I’ll use the UCI Machine Learning ‘Dry beans dataset’&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.  The task is a classification task: predict the type of a bean given some information about its characteristics.&lt;/p&gt;

&lt;p&gt;Vertex Pipelines makes it very straightforward to construct a workflow to support building, evaluating, and deploying such models.   We’ll build a pipeline that looks like this:&lt;/p&gt;
&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans.png&quot; width=&quot;95%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;The DAG for the AutoML classification workflow.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;You can see that the model deployment step is wrapped by a conditional: the model will only be deployed if the evaluation step indicates that it is sufficiently accurate.&lt;/p&gt;

&lt;p&gt;For this &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example&lt;/a&gt;, nearly all the &lt;em&gt;components&lt;/em&gt; (steps) in the pipeline are prebuilt &lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/components/google-cloud&quot;&gt;Google Cloud Pipeline Components&lt;/a&gt;.  This means that we (mostly) just need to specify how the pipeline is put together using pre-existing building blocks.
However, I’ll add one Python function-based &lt;em&gt;custom component&lt;/em&gt; for model evaluation and metrics visualization.
The pipeline definition looks as follows (with a bit of detail elided):&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;automl-tab-beans-training-v2&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                  &lt;span class=&quot;n&quot;&gt;pipeline_root&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PIPELINE_ROOT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bq_source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;bq://aju-dev-demos.beans.beans1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DISPLAY_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PROJECT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;us-central1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;us-central1-aiplatform.googleapis.com&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;thresholds_dict_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'{&quot;auRoc&quot;: 0.95}'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dataset_create_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcc_aip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TabularDatasetCreateOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bq_source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bq_source&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;training_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcc_aip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AutoMLTabularTrainingJobRunOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;optimization_prediction_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;classification&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;optimization_objective&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;minimize-log-loss&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;column_transformations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;numeric&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;column_name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Area&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}},&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;numeric&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;column_name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Perimeter&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}},&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;numeric&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;column_name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;MajorAxisLength&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}},&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;categorical&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;column_name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Class&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}},&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_create_op&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;dataset&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;target_column&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Class&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model_eval_task&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;classif_model_eval_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;thresholds_dict_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;training_op&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;model&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model_eval_task&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;dep_decision&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;true&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;deploy_decision&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;deploy_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcc_aip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ModelDeployOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# noqa: F841
&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;training_op&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;model&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;machine_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;n1-standard-4&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We first create a &lt;a href=&quot;https://cloud.google.com/vertex-ai/docs/datasets/create-dataset-console&quot;&gt;&lt;em&gt;Dataset&lt;/em&gt;&lt;/a&gt; from a BigQuery table that holds the training data. Then, we use AutoML to train a tabular classification model.  The &lt;code class=&quot;highlighter-rouge&quot;&gt;dataset&lt;/code&gt; arg to the training step gets its value from the output of the Dataset step (&lt;code class=&quot;highlighter-rouge&quot;&gt;dataset=dataset_create_op.outputs[&quot;dataset&quot;]&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;After the model is trained, its evaluation metrics are checked against given ‘threshold’ information, to decide whether it’s accurate enough to deploy.
The next section goes into more detail about how this custom ‘eval metrics’ component is defined.  It takes as one of its inputs an output of the training step (&lt;code class=&quot;highlighter-rouge&quot;&gt;training_op.outputs[&quot;model&quot;]&lt;/code&gt;)— which points to the trained model.&lt;/p&gt;

&lt;p&gt;Then, a KFP &lt;em&gt;conditional&lt;/em&gt; uses an output of the eval step to decide whether to proceed with the deployment:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model_eval_task&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;dep_decision&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;true&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;deploy_decision&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the model is sufficiently accurate, the prebuilt deployment component is called.  This step creates an &lt;a href=&quot;https://cloud.google.com/vertex-ai/docs/general/deployment&quot;&gt;&lt;em&gt;Endpoint&lt;/em&gt;&lt;/a&gt; and deploys the trained model to that endpoint for serving.&lt;/p&gt;

&lt;h2 id=&quot;defining-a-custom-component&quot;&gt;Defining a custom component&lt;/h2&gt;

&lt;p&gt;Most of the steps in the pipeline above are drawn from pre-built components; building blocks that make it easy to construct an ML workflow.  But I’ve defined one custom component to parse the trained model’s evaluation metrics, render some metrics visualizations, and determine— based on given ‘threshold’ information— whether the model is good enough to be deployed.  This custom component is defined as a Python function with a &lt;code class=&quot;highlighter-rouge&quot;&gt;@kfp.v2.dsl.component&lt;/code&gt; decorator.  When this function is evaluated, it is compiled to a task ‘factory function’ that can be used in a pipeline specification. The KFP SDK makes it very straightforward to define new pipeline components in this way.&lt;/p&gt;

&lt;p&gt;Below is the custom component definition, with some detail elided.  The &lt;code class=&quot;highlighter-rouge&quot;&gt;@component&lt;/code&gt; decorator specifies three optional args: the base container image to use; any packages to install; and the &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; file to which to write the component specification.&lt;/p&gt;

&lt;p&gt;The component function, &lt;code class=&quot;highlighter-rouge&quot;&gt;classif_model_eval_metrics&lt;/code&gt;, has some input parameters of note.  The &lt;code class=&quot;highlighter-rouge&quot;&gt;model&lt;/code&gt; parameter is an input &lt;code class=&quot;highlighter-rouge&quot;&gt;kfp.v2.dsl.Model&lt;/code&gt; artifact.  As you may remember from the pipeline specification above, here this input will be provided by an output of the training step.&lt;/p&gt;

&lt;p&gt;The last two function args, &lt;code class=&quot;highlighter-rouge&quot;&gt;metrics&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;metricsc&lt;/code&gt; , are component &lt;code class=&quot;highlighter-rouge&quot;&gt;Output&lt;/code&gt;s, in this case of type &lt;code class=&quot;highlighter-rouge&quot;&gt;Metrics&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;ClassificationMetrics&lt;/code&gt;.  They’re not explicitly passed as inputs to the component step, but rather are automatically instantiated and can be used in the component. E.g, in the function below, we’re calling &lt;code class=&quot;highlighter-rouge&quot;&gt;metricsc.log_roc_curve()&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;metricsc.log_confusion_matrix()&lt;/code&gt; to render these visualizations in the Pipelines UI.  These &lt;code class=&quot;highlighter-rouge&quot;&gt;Output&lt;/code&gt; params become component outputs when the component is compiled, and can be consumed by other pipeline steps.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;NamedTuple&lt;/code&gt; outputs are another type of component output.  Here we’re returning a string that indicates whether or not to deploy the model.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;component&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;base_image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;gcr.io/deeplearning-platform-release/tf2-cpu.2-3:latest&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;output_component_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;tables_eval_component.yaml&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;packages_to_install&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;google-cloud-aiplatform&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;classif_model_eval_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# &quot;us-central1&quot;,
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# &quot;us-central1-aiplatform.googleapis.com&quot;,
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;thresholds_dict_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;metricsc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ClassificationMetrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NamedTuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Outputs&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;dep_decision&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]):&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Return parameter.
&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;json&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;

    &lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aiplatform&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Function to fetch model eval info
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_eval_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Use the given metrics threshold(s) to determine whether the model is
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# accurate enough to deploy.
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;classification_thresholds_check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Generate pipeline annotations and visualizations for the metrics info
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;log_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metricsc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
		&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;metricsc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;log_roc_curve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tpr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;metricsc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;log_confusion_matrix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;annotations&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;test_confusion_matrix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;rows&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;aiplatform&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# extract the model resource name from the input Model Artifact
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;model_resource_path&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;uri&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;aiplatform://v1/&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;model path: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_resource_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;client_options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;api_endpoint&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# Initialize client that will be used to create and send requests to Vertex AI.
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aiplatform&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gapic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ModelServiceClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client_options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;eval_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metrics_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metrics_str_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_eval_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_resource_path&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;log_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metricsc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;thresholds_dict_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;deploy&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;classification_thresholds_check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deploy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dep_decision&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;true&quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dep_decision&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;false&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;deployment decision is &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dep_decision&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dep_decision&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When this function is evaluated, we can use the generated factory function to define a pipeline step as part of a pipeline definition, as we saw in the previous section:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;model_eval_task&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;classif_model_eval_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;thresholds_dict_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;training_op&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;model&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example notebook&lt;/a&gt; has the full component definition.&lt;/p&gt;

&lt;h3 id=&quot;sharing-component-specifications&quot;&gt;Sharing component specifications&lt;/h3&gt;

&lt;p&gt;When the component is compiled, we can also indicate that a &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; component specification be generated.  We did this via the optional  &lt;code class=&quot;highlighter-rouge&quot;&gt;output_component_file=&quot;tables_eval_component.yaml&quot;&lt;/code&gt; arg passed to the &lt;code class=&quot;highlighter-rouge&quot;&gt;@component&lt;/code&gt; decorator.
The &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; format allows the component specification to be put under version control and shared with others.&lt;/p&gt;

&lt;p&gt;Then, the component can be used in other pipelines by calling the &lt;a href=&quot;https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#using-your-component-in-a-pipeline&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;kfp.components.load_component_from_url&lt;/code&gt; function&lt;/a&gt; (and other variants like &lt;code class=&quot;highlighter-rouge&quot;&gt;load_component_from_file&lt;/code&gt;).&lt;/p&gt;

&lt;h2 id=&quot;running-a-pipeline-job-on-vertex-pipelines&quot;&gt;Running a pipeline job on Vertex Pipelines&lt;/h2&gt;

&lt;p&gt;Once a pipeline is defined, the next step is to &lt;em&gt;compile&lt;/em&gt; it — which generates a json job spec file— then submit and run it on Vertex Pipelines.  When you submit a pipeline job, you can specify values for pipeline input parameters, overriding their defaults.
The &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example notebook&lt;/a&gt; shows the details of how to do this.&lt;/p&gt;

&lt;p&gt;Once a pipeline is running, you can view its details in the Cloud Console, including the pipeline run and lineage graphs shown above, as well as pipeline step logs and pipeline Artifact details.
You can also submit pipeline job specs via the Cloud Console UI, and the UI makes it easy to clone pipeline runs. The json pipeline specification file may also be put under version control and shared with others.&lt;/p&gt;

&lt;h3 id=&quot;leveraging-pipeline-step-caching-to-develop-and-debug&quot;&gt;Leveraging Pipeline step caching to develop and debug&lt;/h3&gt;

&lt;p&gt;Vertex Pipelines supports step caching, and this helps with iterating on pipeline development— when you rerun a pipeline, if a component’s inputs have not changed, its cached execution results can be reused. If you run this pipeline more than once, you might notice this feature in action.&lt;/p&gt;

&lt;p&gt;If you’re playing along, try making a small change to the &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example notebook&lt;/a&gt; cell that holds the custom component definition (the &lt;code class=&quot;highlighter-rouge&quot;&gt;classif_model_eval_metrics&lt;/code&gt; function in the “Define a metrics eval custom component” section) by uncommenting this line:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c1&quot;&gt;# metrics.metadata[&quot;model_type&quot;] = &quot;AutoML Tabular classification&quot;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then re-compile the component, recompile the pipeline &lt;strong&gt;without changing the &lt;code class=&quot;highlighter-rouge&quot;&gt;DISPLAY_NAME&lt;/code&gt; value&lt;/strong&gt;, and run it again. When you do so, you should see that Vertex Pipelines can leverage the cached executions for the upstream steps— as their inputs didn’t change— and only needs to re-execute from the changed component.  The pipeline DAG for the new run should look as follows, with the ‘recycle’ icon on some of the steps indicating that their cached executions were used.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans_cached.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans_cached.png&quot; width=&quot;60%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Leveraging step caching with the AutoML classification workflow.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: Step caching is on by default, but if you want to disable it, you can pass the &lt;code class=&quot;highlighter-rouge&quot;&gt;enable_caching=False&lt;/code&gt; arg to the &lt;code class=&quot;highlighter-rouge&quot;&gt;create_run_from_job_spec&lt;/code&gt; function when you submit a pipeline run.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;lineage-tracking&quot;&gt;Lineage tracking&lt;/h3&gt;

&lt;p&gt;If you click on an Artifact in a pipeline graph, you’ll see a “VIEW LINEAGE” button.  This tracks how the artifacts are connected by step executions. So it’s kind of the inverse of the pipeline DAG, and can include multiple executions that consumed the same artifact (this can happen with cache hits, for example).
The tracking information shown is not necessarily  just for a single pipeline run, but for any pipeline execution that has used the given artifact.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans_lineage_tracker.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/mp/beans_lineage_tracker.png&quot; width=&quot;60%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Lineage tracking.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;This post introduced Vertex Pipelines, and the prebuilt &lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/components/google-cloud&quot;&gt;Google Cloud Pipeline Components&lt;/a&gt;, which allow easy access to Vertex AI services.   The Pipelines &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb&quot;&gt;example&lt;/a&gt; in this post uses the AutoML Tabular service, showing how straightforward it is to bring your own data to train a model. It showed a pipeline that creates a Dataset, trains a model using that dataset, obtains the model’s evaluation metrics, and decides whether or not to deploy the model to Vertex AI for serving.&lt;/p&gt;

&lt;p&gt;For next steps, check out &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/tree/master/ai-platform-unified/notebooks/official/pipelines&quot;&gt;other Vertex Pipelines example notebooks&lt;/a&gt; as well as a &lt;a href=&quot;https://codelabs.developers.google.com/vertex-pipelines-intro?hl=en&amp;amp;continue=https%3A%2F%2Fcodelabs.developers.google.com%2F%3Fcat%3Dcloud#0&quot;&gt;codelab&lt;/a&gt; based in part on the pipeline in this post.
You  can also find other Vertex AI notebook examples &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/tree/master/ai-platform-unified/notebooks/official&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://github.com/GoogleCloudPlatform/ai-platform-samples/tree/master/ai-platform-unified/notebooks/unofficial&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;from: KOKLU, M. and OZKAN, I.A., (2020) “Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques.”In Computers and Electronics in Agriculture, 174, 105507. &lt;a href=&quot;https://doi.org/10.1016/j.compag.2020.105507&quot;&gt;DOI&lt;/a&gt; &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Event-triggered Kubeflow Pipeline runs, and using TFDV to detect data drift</title><link href="https://amygdala.github.io/gcp_blog/ml/kfp/mlops/tfdv/gcf/2021/02/26/kfp_tfdv_event_triggered.html" rel="alternate" type="text/html" title="Event-triggered Kubeflow Pipeline runs, and using TFDV to detect data drift" /><published>2021-02-26T00:00:00-06:00</published><updated>2021-02-26T00:00:00-06:00</updated><id>https://amygdala.github.io/gcp_blog/ml/kfp/mlops/tfdv/gcf/2021/02/26/kfp_tfdv_event_triggered</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/kfp/mlops/tfdv/gcf/2021/02/26/kfp_tfdv_event_triggered.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;With ML workflows,  it is often insufficient to train and deploy a given model just once.  Even if the model has desired accuracy initially, this can change if the data used for making prediction requests becomes— perhaps over time— sufficiently different from the data used to originally train the model.&lt;/p&gt;

&lt;p&gt;When new data becomes available, which could be used for retraining a model, it can be helpful to apply techniques for analyzing &lt;em&gt;data ‘drift’&lt;/em&gt;, and determining whether the drift is sufficiently anomalous to warrant retraining yet.
It can also be useful to trigger such an analysis— and potential re-run of your training pipeline— &lt;em&gt;automatically&lt;/em&gt;, upon arrival of new data.&lt;/p&gt;

&lt;p&gt;This blog post highlights an &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;example notebook&lt;/a&gt; that shows how to set up such a scenario with &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/&quot;&gt;Kubeflow Pipelines&lt;/a&gt; (KFP).
It shows how to build a pipeline that checks for statistical drift across successive versions of a dataset and uses that information to make a decision on whether to (re)train a model&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;; and how to configure event-driven deployment of pipeline jobs when new data arrives.&lt;/p&gt;

&lt;p&gt;The notebook builds on an example highlighted in a &lt;a href=&quot;https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component.html&quot;&gt;previous blog post&lt;/a&gt; — which shows a KFP training and serving pipeline— and introduces two primary new concepts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the example demonstrates use of the &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/tfdv&quot;&gt;TensorFlow Data Validation (TFDV)&lt;/a&gt; library to build pipeline components that derive &lt;strong&gt;dataset statistics&lt;/strong&gt; and detect &lt;strong&gt;drift&lt;/strong&gt; between older and newer dataset versions, and shows how to use drift information to decide whether to retrain a model on newer data.&lt;/li&gt;
  &lt;li&gt;the example shows how to support &lt;strong&gt;event-triggered&lt;/strong&gt; launch of Kubeflow Pipelines runs from a &lt;a href=&quot;https://cloud.google.com/functions/docs/&quot;&gt;Cloud Functions&lt;/a&gt; (GCF) function, where the Function run is triggered by addition of a file to a given &lt;a href=&quot;https://cloud.google.com/storage&quot;&gt;Cloud Storage&lt;/a&gt; (GCS) bucket.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The machine learning task uses a tabular dataset that joins London bike rental information with weather data, and trains a Keras model to predict rental duration. See &lt;a href=&quot;https://amygdala.github.io/gcp_blog/ml/kfp/kubeflow/keras/tensorflow/hp_tuning/2020/10/19/keras_tuner.html&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component.html&quot;&gt;this&lt;/a&gt; blog post and associated &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;README&lt;/a&gt; for more background on the dataset and model architecture.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/CleanShot%202021-02-26%20at%2011.30.13%402x.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/CleanShot%202021-02-26%20at%2011.30.13%402x.png&quot; width=&quot;80%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;A pipeline run using TFDV-based components to detect 'data drift'.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;running-the-example-notebook&quot;&gt;Running the example notebook&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;example notebook&lt;/a&gt; requires a &lt;a href=&quot;https://cloud.google.com/&quot;&gt;Google Cloud Platform (GCP)&lt;/a&gt; account and project, ideally with quota for using GPUs, and— as detailed in the notebook— an installation of &lt;strong&gt;AI Platform Pipelines (Hosted Kubeflow Pipelines)&lt;/strong&gt; (that is, an installation of KFP on &lt;a href=&quot;https://cloud.google.com/kubernetes-engine&quot;&gt;Google Kubernetes Engine (GKE)&lt;/a&gt;), with a few additional configurations once installation is complete.&lt;/p&gt;

&lt;p&gt;The notebook can be run using either &lt;a href=&quot;https://colab.research.google.com/&quot;&gt;Colab&lt;/a&gt; (&lt;a href=&quot;https://colab.research.google.com/github/amygdala/code-snippets/blob/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;open directly&lt;/a&gt;) or &lt;a href=&quot;https://cloud.google.com/ai-platform-notebooks&quot;&gt;AI Platform Notebooks&lt;/a&gt; (&lt;a href=&quot;https://console.cloud.google.com/ai-platform/notebooks/deploy-notebook?download_url=https://raw.githubusercontent.com/amygdala/code-snippets/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;open directly&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;creating-tfdv-based-kfp-components&quot;&gt;Creating TFDV-based KFP components&lt;/h2&gt;

&lt;p&gt;Our first step is to build the TFDV components that we want to use in our pipeline.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: For this example, our training data is in GCS, in CSV-formatted files.  So, we can take advantage of TFDV’s ability to process CSV files.  The TFDV libraries can also process files in &lt;code class=&quot;highlighter-rouge&quot;&gt;TFRecords&lt;/code&gt; format.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We’ll define both TFDV KFP pipeline &lt;em&gt;components&lt;/em&gt; as &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/python-function-components/&quot;&gt;‘lightweight’ Python-function-based components&lt;/a&gt;. For each component, we define a function, then call &lt;code class=&quot;highlighter-rouge&quot;&gt;kfp.components.func_to_container_op()&lt;/code&gt; on that function to build a &lt;strong&gt;reusable&lt;/strong&gt; component in &lt;code class=&quot;highlighter-rouge&quot;&gt;.yaml&lt;/code&gt; format.
Let’s take a closer look at how this works (details are in the &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;notebook&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Below is the Python function we’ll use to generate TFDV statistics from a collection of &lt;code class=&quot;highlighter-rouge&quot;&gt;csv&lt;/code&gt; files.  The function— and the component we’ll create from it— outputs the path to the generated stats file.  When we define a pipeline that uses this component, we’ll use this step’s output as input to another pipeline step.
TFDV uses a &lt;a href=&quot;https://beam.apache.org/&quot;&gt;Beam&lt;/a&gt; pipeline— not to be confused with KFP Pipelines— to implement the stats generation. Depending upon configuration, the component can use either the Direct (local) runner or the &lt;a href=&quot;https://cloud.google.com/dataflow#section-5&quot;&gt;Dataflow&lt;/a&gt; runner.  Running the Beam pipeline on Dataflow rather than locally can make sense with large datasets.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;typing&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NamedTuple&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;generate_tfdv_stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcs_temp_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcs_staging_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'requirements.txt'&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NamedTuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Outputs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'stats_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]):&lt;/span&gt;

  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;
  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;time&lt;/span&gt;

  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tensorflow_data_validation&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv&lt;/span&gt;
  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tensorflow_data_validation.statistics.stats_impl&lt;/span&gt;
  &lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;apache_beam.options.pipeline_options&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PipelineOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GoogleCloudOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StandardOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SetupOptions&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getLogger&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INFO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;output path: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Building pipeline options&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;# Create and set your PipelineOptions.
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PipelineOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'true'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;using Dataflow&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;warning&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'tfdv whl file required with dataflow runner.'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;nb&quot;&gt;exit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view_as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GoogleCloudOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'{}-{}'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;staging_location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcs_staging_location&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;temp_location&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcs_temp_location&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;google_cloud_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view_as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StandardOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runner&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'DataflowRunner'&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;setup_options&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;view_as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SetupOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;setup_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;extra_packages&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;setup_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'requirements.txt'&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;tfdv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_statistics_from_csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;data_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;pipeline_options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To turn this function into a KFP &lt;em&gt;component&lt;/em&gt;, we’ll call &lt;code class=&quot;highlighter-rouge&quot;&gt;kfp.components.func_to_container_op()&lt;/code&gt;.  We’re passing it a base container image to use: &lt;code class=&quot;highlighter-rouge&quot;&gt;gcr.io/google-samples/tfdv-tests:v1&lt;/code&gt;.  This base image has the TFDV libraries already installed, so that we don’t need to install them ‘inline’ when we run a pipeline step based on this component.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;components&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_to_container_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_tfdv_stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;output_component_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'tfdv_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;base_image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'gcr.io/google-samples/tfdv-tests:v1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We’ll take the same approach to build a second TFDV-based component, one which detects &lt;em&gt;drift&lt;/em&gt; between datasets by comparing their stats.  The TFDV library makes this straightforward.  We’re using a drift comparator appropriate for a &lt;em&gt;regression&lt;/em&gt; model— as used in the example pipeline— and looking for drift on a given set of fields (in this case, for example purposes, just one).
The &lt;code class=&quot;highlighter-rouge&quot;&gt;tensorflow_data_validation.validate_statistics()&lt;/code&gt; call will then tell us whether the drift anomaly for that field is over the specified threshold. See the &lt;a href=&quot;https://www.tensorflow.org/tfx/data_validation/get_started#checking_data_skew_and_drift&quot;&gt;TFDV docs&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;n&quot;&gt;schema1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;infer_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;statistics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tfdv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_feature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schema1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'duration'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;drift_comparator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jensen_shannon_divergence&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;threshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.01&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;drift_anomalies&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;validate_statistics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;statistics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schema1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previous_statistics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(The details of this second component definition are in the example notebook).&lt;/p&gt;

&lt;h2 id=&quot;defining-a-pipeline-that-uses-the-tfdv-components&quot;&gt;Defining a pipeline that uses the TFDV components&lt;/h2&gt;

&lt;p&gt;After we’ve defined both TFDV components— one to generate stats for a dataset, and one to detect drift between datasets— we’re ready to build a Kubeflow Pipeline that uses these components, in conjunction with previously-built components for a training &amp;amp; serving workflow.&lt;/p&gt;

&lt;h3 id=&quot;instantiate-pipeline-ops-from-the-components&quot;&gt;Instantiate pipeline &lt;em&gt;ops&lt;/em&gt; from the components&lt;/h3&gt;

&lt;p&gt;KFP components in &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; format are shareable and reusable.  We’ll build our pipeline by starting with some already-built components— (described in more detail &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;here&lt;/a&gt;)— that support our basic ‘train/evaluate/deploy’ workflow.&lt;/p&gt;

&lt;p&gt;We’ll instantiate some pipeline ops from these pre-existing components like this, by loading them via URL:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.components&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# pre-existing components
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'https://raw.githubusercontent.com/amygdala/code-snippets/master/ml/kubeflow-pipelines/keras_tuner/components/train_component.yaml'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;etc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;… then create our TFDV ops from the new components we just built:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
&lt;span class=&quot;n&quot;&gt;tfdv_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'tfdv_component.yaml'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tfdv_drift_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'tfdv_drift_component.yaml'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we define a KFP pipeline from the defined ops.  We’re not showing the pipeline in full here— see the notebook for details.
Two pipeline steps are based on the &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv_op&lt;/code&gt;, which generates the stats.  &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv1&lt;/code&gt; generates stats for the test data, and &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv2&lt;/code&gt; for the training data.
In the following, you can see that the &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv_drift&lt;/code&gt; step (based on the &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv_drift_op&lt;/code&gt;) takes as input the output from the &lt;code class=&quot;highlighter-rouge&quot;&gt;tfdv2&lt;/code&gt; (stats for training data) step.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bikes_weather_tfdv'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Model bike rental duration given weather'&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;bikes_weather_tfdv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;params&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://YOUR/GCS/PATH'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://aju-dev-demos-codelabs/bikes_weather/'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'YOUR-PROJECT-ID'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'us-central1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'requirements.txt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'test'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'tensorflow_data_validation-0.26.0-cp37-cp37m-manylinux2010_x86_64.whl'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;stats_older_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://aju-dev-demos-codelabs/bikes_weather_chronological/evaltrain1.pb'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;tfdv1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# TFDV stats for the test data
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;input_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;stest-*.csv'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/eval/evaltest.pb'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s-1'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcs_temp_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers/tmp'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcs_staging_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tfdv2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# TFDV stats for the training data
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;input_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;strain-*.csv'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# output_path='%s/%s/eval/evaltrain.pb' % (output_path, dsl.RUN_ID_PLACEHOLDER),
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/eval/evaltrain.pb'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s-2'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;job_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;use_dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcs_temp_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers/tmp'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcs_staging_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/tfdv_expers'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;whl_location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;requirements_file&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# compare generated training data stats with stats from a previous version
&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# of the training data set.
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;tfdv_drift&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv_drift_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats_older_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tfdv2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'stats_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# proceed with training if drift is detected (or if no previous stats were provided)
&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tfdv_drift&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'drift'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'true'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;train&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;train_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval_metrics_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;serve&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serve_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;While not all pipeline details are shown, you can see that this pipeline definition includes some conditional expressions; parts of the pipeline will run only if an output of an ‘upstream’ step meets the given conditions.  We start the model training step if drift anomalies were detected.  (And, once training is completed, we’ll deploy the model for serving only if its evaluation metrics meet certain thresholds).&lt;/p&gt;

&lt;p&gt;Here’s the &lt;a href=&quot;https://en.wikipedia.org/wiki/Directed_acyclic_graph&quot;&gt;DAG&lt;/a&gt; for this pipeline.  You can see the conditional expressions reflected; and can see that the step to generate stats for the test dataset provides no downstream dependencies, but the stats on the training set are used as input for the drift detection step.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/bw_tfdv_pipeline.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/bw_tfdv_pipeline.png&quot; width=&quot;80%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;The pipeline DAG&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Here’s a pipeline run in progress:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/bw_tfdv_pipeline_run.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/bw_tfdv_pipeline_run.png&quot; width=&quot;80%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;A pipeline run in progress.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;See the &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/notebook_examples/hosted_kfp/event_triggered_kfp_pipeline_bw.ipynb&quot;&gt;example notebook&lt;/a&gt; for more details on how to run this pipeline.&lt;/p&gt;

&lt;h2 id=&quot;event-triggered-pipeline-runs&quot;&gt;Event-triggered pipeline runs&lt;/h2&gt;

&lt;p&gt;Once you have defined this pipeline, a next useful step is to automatically run it when an update to the dataset is available, so that each dataset update triggers an analysis of data drift and potential model (re)training.&lt;/p&gt;

&lt;p&gt;We’ll show how to do this using &lt;a href=&quot;https://cloud.google.com/functions/docs/&quot;&gt;Cloud Functions (GCF)&lt;/a&gt;, by setting up a function that is triggered when new data is added to a &lt;a href=&quot;https://cloud.google.com/storage&quot;&gt;GCS&lt;/a&gt; bucket.&lt;/p&gt;

&lt;h3 id=&quot;set-up-a-gcf-function-to-trigger-a-pipeline-run-when-a-dataset-is-updated&quot;&gt;Set up a GCF function to trigger a pipeline run when a dataset is updated&lt;/h3&gt;

&lt;p&gt;We’ll define and deploy a &lt;a href=&quot;https://cloud.google.com/functions/docs/&quot;&gt;Cloud Functions (GCF)&lt;/a&gt; function that launches a run of this pipeline when new training data becomes available, as triggered by the creation or modification of a file in a ‘trigger’ bucket on GCS.&lt;/p&gt;

&lt;p&gt;In most cases, you don’t want to launch a new pipeline run for every new file added to a dataset— since typically, the dataset will be comprised of a collection of files, to which you will add/update multiple files in a batch. So, you don’t want the ‘trigger bucket’ to be the dataset bucket (if the data lives on GCS)— as that will trigger unwanted pipeline runs.
Instead, we’ll trigger a pipeline run after the upload of a &lt;em&gt;batch&lt;/em&gt; of new data has completed.&lt;/p&gt;

&lt;p&gt;To do this, we’ll use an approach where the ‘trigger’ bucket is different from the bucket used to store dataset files. ‘Trigger files’ uploaded to that bucket are expected to contain the path of the updated dataset as well as the path to the data stats file generated for the last model trained.
A trigger file is uploaded once the new data upload has completed, and that upload triggers a run of the GCF function, which in turn reads info on the new data path from the trigger file and launches the pipeline job.&lt;/p&gt;

&lt;h4 id=&quot;define-the-gcf-function&quot;&gt;Define the GCF function&lt;/h4&gt;

&lt;p&gt;To set up this process, we’ll first define the GCF function in a file called &lt;code class=&quot;highlighter-rouge&quot;&gt;main.py&lt;/code&gt;, as well as an accompanying requirements file in the same directory that specifies the libraries to load prior to running the function.  The requirements file will indicate to install the KFP SDK:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The code looks like this (with some detail removed); we parse the trigger file contents and use that information to launch a pipeline run. The code uses the values of several environment variables that we will set when uploading the GCF function.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;os&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compiler&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;components&lt;/span&gt;

&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;storage&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;PIPELINE_PROJECT_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getenv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'PIPELINE_PROJECT_ID'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;etc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;read_trigger_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;storage_client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;Read the contents of the trigger file and return as string.
    &quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;....&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bucket&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;storage_client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_bucket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bucket'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;blob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bucket&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_blob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'name'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;trigger_file_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blob&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;download_as_string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'trigger file contents: {}'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trigger_file_string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trigger_file_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'UTF-8'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;gcs_update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;Background Cloud Function to be triggered by Cloud Storage.
    &quot;&quot;&quot;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;storage_client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;storage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# get the contents of the trigger file
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;trigger_file_string&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;read_trigger_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;storage_client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;trigger_file_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trigger_file_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# then run the pipeline using the given job spec, passing the trigger file contents
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# as parameter values.
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'running pipeline with id &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s...'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PIPELINE_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# create the client object
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PIPELINE_HOST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# deploy the pipeline run
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;run&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run_pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EXP_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'bw_tfdv_gcf'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pipeline_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PIPELINE_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                          &lt;span class=&quot;n&quot;&gt;params&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'working_dir'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WORKING_DIR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                  &lt;span class=&quot;s&quot;&gt;'project_id'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PIPELINE_PROJECT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                  &lt;span class=&quot;s&quot;&gt;'use_dataflow'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;USE_DATAFLOW&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                  &lt;span class=&quot;s&quot;&gt;'data_dir'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trigger_file_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                  &lt;span class=&quot;s&quot;&gt;'stats_older_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trigger_file_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]})&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'job response: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then we’ll deploy the GCF function as follows. Note that we’re indicating to use the &lt;code class=&quot;highlighter-rouge&quot;&gt;gcs_update&lt;/code&gt; definition (from &lt;code class=&quot;highlighter-rouge&quot;&gt;main.py&lt;/code&gt;), and specifying the trigger bucket.   Note also how we’re setting environment vars as part of the deployment.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;gcloud functions deploy gcs_update &lt;span class=&quot;nt&quot;&gt;--set-env-vars&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nv&quot;&gt;PIPELINE_PROJECT_ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;PROJECT_ID&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,WORKING_DIR&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;WORKING_DIR&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,PIPELINE_SPEC&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;PIPELINE_SPEC&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,PIPELINE_ID&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;PIPELINE_ID&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,PIPELINE_HOST&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;PIPELINE_HOST&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,EXP_ID&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;EXP_ID&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;,USE_DATAFLOW&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--runtime&lt;/span&gt; python37 &lt;span class=&quot;nt&quot;&gt;--trigger-resource&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;TRIGGER_BUCKET&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--trigger-event&lt;/span&gt; google.storage.object.finalize
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;trigger-a-pipeline-run-when-new-data-becomes-available&quot;&gt;Trigger a pipeline run when new data becomes available&lt;/h3&gt;

&lt;p&gt;Once the GCF function is set up, it will run when a file is added to (or modified in) the trigger bucket.  For this simple example, the GCF function expects trigger files of the following format, where the first line is the path to the updated dataset, and the second line is the path to the TFDV stats for the dataset used for the previously-trained model.  More generally, such a trigger file can contain whatever information is necessary to determine how to parameterize the pipeline run.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;gs://path/to/new/or/updated/dataset/
gs://path/to/stats/from/previous/dataset/stats.pb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;This blog post showed how to build Kubeflow Pipeline components, using the TFDV libraries, to analyze datasets and detect data drift. Then, it showed how to support event-triggered pipeline runs via Cloud Functions.&lt;/p&gt;

&lt;p&gt;The post didn’t include use of TFDV to visualize and explore the generated stats, but &lt;a href=&quot;https://colab.sandbox.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/data_validation/tfdv_basic.ipynb&quot;&gt;this example notebook&lt;/a&gt; shows how you can do that.
You can also &lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/samples&quot;&gt;explore the samples&lt;/a&gt; in the Kubeflow Pipelines GitHub repo.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In this example, we show full model retraining on a new dataset.  An alternate scenario— not covered here— could involve &lt;em&gt;tuning&lt;/em&gt; an existing model with new data. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Keras Tuner KFP example, part II— creating a lightweight component for metrics evaluation</title><link href="https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component.html" rel="alternate" type="text/html" title="Keras Tuner KFP example, part II— creating a lightweight component for metrics evaluation" /><published>2020-10-26T00:00:00-05:00</published><updated>2020-10-26T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;This &lt;a href=&quot;https://amygdala.github.io/gcp_blog/ml/kfp/kubeflow/keras/tensorflow/hp_tuning/2020/10/19/keras_tuner.html&quot;&gt;blog post&lt;/a&gt; and accompanying &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;tutorial&lt;/a&gt; walked through how to build a &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/&quot;&gt;Kubeflow Pipelines&lt;/a&gt; (KFP) pipeline that uses the &lt;a href=&quot;https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html&quot;&gt;Keras Tuner&lt;/a&gt; to build a hyperparameter-tuning workflow that uses distributed HP search.&lt;/p&gt;

&lt;p&gt;That pipeline does HP tuning, then runs full training on the N best parameter sets identified from the HP search, then deploys the full models to &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/serving&quot;&gt;TF-serving&lt;/a&gt;.&lt;br /&gt;
One thing that was missing from that pipeline was any check on the quality of the trained models prior to deployment to TF-Serving.&lt;/p&gt;

&lt;p&gt;This post is based on an &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/notebooks/metrics_eval_component.ipynb&quot;&gt;example notebook&lt;/a&gt;, and is a follow-on to that tutorial.  You can follow along in the notebook instead if you like (see below).&lt;/p&gt;

&lt;p&gt;Here, we’ll show how you can create a KFP “lightweight component”, built from a python function, to do a simple threshold check on some of the model metrics in order to decide whether to deploy the model. (This is a pretty simple approach, that we’re using for illustrative purposes; for production models you’d probably want to do more sophisticated analyses. The &lt;a href=&quot;https://www.tensorflow.org/tfx/model_analysis/get_started&quot;&gt;TFMA library&lt;/a&gt; might be of interest).
We’ll also show how to use the KFP SDK to define and run pipelines from a notebook.&lt;/p&gt;

&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;

&lt;p&gt;This example assumes that you’ve &lt;strong&gt;done the setup indicated in the &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;README&lt;/a&gt;&lt;/strong&gt;, and have an AI Platform Pipelines (Hosted KFP) installation, with GPU node pools added to the cluster.&lt;/p&gt;

&lt;h3 id=&quot;create-an-ai-platform-notebooks-instance&quot;&gt;Create an AI Platform Notebooks instance&lt;/h3&gt;

&lt;p&gt;In addition, create an AI Platform Notebooks instance on which to run &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/notebooks/metrics_eval_component.ipynb&quot;&gt;the example notebook&lt;/a&gt; on which this post is based. See setup instructions &lt;a href=&quot;https://cloud.google.com/ai-platform/notebooks/docs&quot;&gt;here&lt;/a&gt;. (You can run this notebook in other environments, including, locally, but that requires additional auth setup that we won’t go into here).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once your notebook instance is set up, you should be able to use &lt;a href=&quot;https://console.cloud.google.com/ai-platform/notebooks/deploy-notebook?name=Create%20a%20new%20KFP%20component%20from%20a%20notebook&amp;amp;download_url=https%3A%2F%2Fraw.githubusercontent.com%2Famygdala%2Fcode-snippets%2Fmaster%2Fml%2Fkubeflow-pipelines%2Fkeras_tuner%2Fnotebooks%2Fmetrics_eval_component.ipynb&amp;amp;url=https%3A%2F%2Fgithub.com%2Famygdala%2Fcode-snippets%2Fblob%2Fmaster%2Fml%2Fkubeflow-pipelines%2Fkeras_tuner%2Fnotebooks%2Fmetrics_eval_component.ipynb&quot;&gt;this link&lt;/a&gt; to upload and run the notebook.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 id=&quot;install-the-kfp-sdk&quot;&gt;Install the KFP SDK&lt;/h3&gt;

&lt;p&gt;Next, we’ll install the KFP SDK.  In a notebook, you may need to restart the kernel so it’s available for import.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pip&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;U&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;api&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we’ll do some imports:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# the Pipelines SDK. 
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compiler&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.dsl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.gcp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gcp&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.components&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;defining-a-new-lightweight-component-based-on-a-python-function&quot;&gt;Defining a new ‘lightweight component’ based on a python function&lt;/h2&gt;

&lt;p&gt;‘Lightweight’ KFP python components allow you to create a component from a python function definition, and do not require you to build a new container image for every code change. They’re helpful for fast iteration in a notebook environment. You can read more &lt;a href=&quot;https://github.com/kubeflow/pipelines/blob/master/samples/core/lightweight_component/lightweight_component.ipynb&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this section, we’ll create a lightweight component that uses training metrics info to decide whether to deploy a model.
We’ll pass a “threshold” dict as a component arg, and compare those thresholds to the metrics values, and use that info to decide whether or not to deploy.  Then we’ll output a string indicating the decision.&lt;/p&gt;

&lt;p&gt;(As mentioned above, for production models you’d probably want to do a more substantial analysis. The &lt;a href=&quot;https://www.tensorflow.org/tfx/model_analysis/get_started&quot;&gt;TFMA library&lt;/a&gt; might be of interest. Stay tuned for a follow-on post that uses TFMA).&lt;/p&gt;

&lt;p&gt;Then we’ll define a pipeline that uses the new component. In the pipeline spec, we’ll make the ‘serve’ step conditional on the “metrics” op output.&lt;/p&gt;

&lt;p&gt;First, we’ll define the component function, &lt;code class=&quot;highlighter-rouge&quot;&gt;eval_metrics&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;typing&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NamedTuple&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NamedTuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Outputs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]):&lt;/span&gt;

  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;json&lt;/span&gt;
  &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regression_threshold_check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'k {}, v {}'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'root_mean_squared_error'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'mae'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metrics_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'{} &amp;gt; {}; returning False'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
          &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'False'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getLogger&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INFO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'thresholds dict: {}'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;thresholds_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'metrics: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;metrics_dict&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;got metrics info: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;metrics_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regression_threshold_check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metrics_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy decision: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To keep things simple, we’re comparing only RMSE and MAE with given threshold values.  (This function is tailored for our Keras regression model). Lower is better, so if a threshold value is higher than the associated model metric, we won’t deploy.&lt;/p&gt;

&lt;p&gt;Next, we’ll create a ‘container op’ from the &lt;code class=&quot;highlighter-rouge&quot;&gt;eval_metrics&lt;/code&gt; function definition, via the &lt;code class=&quot;highlighter-rouge&quot;&gt;funct_to_container_op&lt;/code&gt; method. As one of the method args, we specify the base container image that will run the function. 
Here, we’re using one of the &lt;a href=&quot;https://cloud.google.com/ai-platform/deep-learning-containers/docs/&quot;&gt;Deep Learning Container images&lt;/a&gt;.  (This container image installs more than is necessary for this simple function, but these DL images can be useful for many ML-related components).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_to_container_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;base_image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'gcr.io/deeplearning-platform-release/tf2-cpu.2-3:latest'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;define-a-pipeline-that-uses-the-new-metrics-op&quot;&gt;Define a pipeline that uses the new “metrics” op&lt;/h2&gt;

&lt;p&gt;Now, we can define a new pipeline that uses the new op and makes the model serving conditional on the results.&lt;/p&gt;

&lt;p&gt;The new &lt;code class=&quot;highlighter-rouge&quot;&gt;eval_metrics_op&lt;/code&gt; takes as an input one of the &lt;code class=&quot;highlighter-rouge&quot;&gt;train_op&lt;/code&gt; outputs, which outputs a final metrics dict. (We “cheated” a bit, as the training component was already designed to output this info; in other cases you might end up defining a new version of such an op that outputs the new info you need).&lt;/p&gt;

&lt;p&gt;Then, we’ll wrap the serving op in a &lt;em&gt;conditional&lt;/em&gt;; we won’t set up a TF-serving service unless the &lt;code class=&quot;highlighter-rouge&quot;&gt;eval_metrics&lt;/code&gt; op has certified that it is okay.&lt;/p&gt;

&lt;p&gt;Note that this new version of the pipeline also has a new input parameter— the &lt;code class=&quot;highlighter-rouge&quot;&gt;thresholds&lt;/code&gt; dict.&lt;/p&gt;

&lt;p&gt;To keep things simple, we’ll first define a pipeline that skips the HP tuning part of the pipeline used &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;here&lt;/a&gt;.  This will make it easier to test your new op with a pipeline that takes a shorter time to run.&lt;/p&gt;

&lt;p&gt;Then in a following section we’ll show how to augment the full HP tuning pipeline to include the new op.&lt;/p&gt;

&lt;p&gt;We’ll first instantiate the other pipeline ops from their &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/component-development/&quot;&gt;reusable components&lt;/a&gt; definitions.  (And we’ve defined the &lt;code class=&quot;highlighter-rouge&quot;&gt;eval_metrics_op&lt;/code&gt; above).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
&lt;span class=&quot;n&quot;&gt;train_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'https://raw.githubusercontent.com/amygdala/code-snippets/master/ml/kubeflow-pipelines/keras_tuner/components/train_component.yaml'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;serve_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'https://raw.githubusercontent.com/amygdala/code-snippets/master/ml/kubeflow-pipelines/keras_tuner/components/serve_component.yaml'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tb_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'https://raw.githubusercontent.com/kubeflow/pipelines/master/components/tensorflow/tensorboard/prepare_tensorboard/component.yaml'&lt;/span&gt; 
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we’ll define the pipeline itself.  You might notice that this pipeline has a new parameter, &lt;code class=&quot;highlighter-rouge&quot;&gt;thresholds&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This pipeline first sets up a TensorBoard visualization for monitoring the training run. Then it starts the training. Once training is finished, the new op checks whether the trained model’s final metrics are above the given threshold(s). 
If so (using the KFP &lt;code class=&quot;highlighter-rouge&quot;&gt;dsl.Condition&lt;/code&gt; construct), TF-serving is used to set up a prediction service on the Pipelines GKE cluster.&lt;/p&gt;

&lt;p&gt;You can see that data is being passed between the pipeline ops. &lt;a href=&quot;https://gist.github.com/amygdala/bfa0f599a4814b3261367f558a852bfe&quot;&gt;Here’s a tutorial&lt;/a&gt; that goes into how that works in more detail.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bikes_weather_metrics'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Model bike rental duration given weather'&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;bikes_weather_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;train_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://YOUR/GCS/PATH'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# for the full training jobs
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://aju-dev-demos-codelabs/bikes_weather/'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# if -1, don't override normal calcs based on dataset size
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;hptune_params&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'[{&quot;num_hidden_layers&quot;: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s, &quot;learning_rate&quot;: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s, &quot;hidden_size&quot;: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s}]'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'{&quot;root_mean_squared_error&quot;: 2000}'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# create TensorBoard viz for the parent directory of all training runs, so that we can
&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# compare them.
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tb_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;log_dir_uri&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;train&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;train_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;workdir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'log_dir_uri'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tb_dir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'log_dir_uri'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;epochs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;hp_idx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;hptune_results&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hptune_params&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval_metrics_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'metrics_output_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# conditional serving
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;serve&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serve_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;model_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'train_output_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;model_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bikesw'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;namespace&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'default'&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_gpu_limit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we can run the pipeline from the notebook.  First create a client object to talk to your KFP installation. Using that client, create (or get) an &lt;em&gt;Experiment&lt;/em&gt; (which lets you create semantic groupings of pipeline runs).&lt;/p&gt;

&lt;p&gt;You’ll need to set the correct host endpoint for your pipelines installation when you create the client.  Visit the &lt;a href=&quot;https://console.cloud.google.com/ai-platform/pipelines/clusters&quot;&gt;Pipelines panel in the Cloud Console&lt;/a&gt; and click on the &lt;strong&gt;SETTINGS&lt;/strong&gt; gear for the desired installation to get its endpoint.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# CHANGE THIS with the info for your KFP cluster installation
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'xxxxxxxx-dot-us-centralx.pipelines.googleusercontent.com'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_experiment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bw_expers'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# this is a 'get or create' call
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(If the &lt;code class=&quot;highlighter-rouge&quot;&gt;create_experiment&lt;/code&gt; call failed, double check your host endpoint value).&lt;/p&gt;

&lt;p&gt;Now, we can compile and then run the pipeline.  We’ll set some vars with pipeline params:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;WORKING_DIR&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://YOUR_GCS/PATH'&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TRAIN_EPOCHS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we’ll compile and run the pipeline.&lt;/p&gt;

&lt;p&gt;Note that this pipeline is configured to use a GPU node for the training step, so make sure that you have set up a GPU node pool for the cluster that your KFP installation is running on, as described in this &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;README&lt;/a&gt;. Note also that GPU nodes are more expensive.&lt;br /&gt;
If you want, you can comment out the &lt;code class=&quot;highlighter-rouge&quot;&gt;train.set_gpu_limit(2)&lt;/code&gt; line in the pipeline definition above to run training on a CPU node.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;compiler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Compiler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bikes_weather_metrics&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'bikes_weather_metrics.tar.gz'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run_pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'bw_metrics_test'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'bikes_weather_metrics.tar.gz'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                          &lt;span class=&quot;n&quot;&gt;params&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'working_dir'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WORKING_DIR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'train_epochs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TRAIN_EPOCHS&lt;/span&gt;
                                 &lt;span class=&quot;c1&quot;&gt;# 'thresholds': THRESHOLDS
&lt;/span&gt;                                 &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once you’ve kicked off the run, click the generated link to see the pipeline run in the Kubeflow Pipelines dashboard of your pipelines installation. (See the last section for more info on how to use your trained and deployed model for prediction).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: It’s also possible to start a pipeline run directly from the pipeline function definition, skipping the local compilation, like this:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kfp_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_run_from_pipeline_func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline_function_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arguments&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;use-the-new-metrics-op-with-the-full-keras-tuner-pipeline&quot;&gt;Use the new “metrics” op with the full Keras Tuner pipeline&lt;/h2&gt;

&lt;p&gt;To keep things simple, the pipeline above didn’t do an HP tuning search.
Below is how the full pipeline from &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;this tutorial&lt;/a&gt; would be redefined to use this new op.&lt;/p&gt;

&lt;p&gt;This definition assumes that you’ve run the cells above that instantiated the ops from their component specs. This new definition includes an additional &lt;code class=&quot;highlighter-rouge&quot;&gt;hptune&lt;/code&gt; op (defined “inline” using &lt;code class=&quot;highlighter-rouge&quot;&gt;dsl.ContainerOp()&lt;/code&gt;) that deploys the distributed HP tuning job and then waits for the results.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Important note&lt;/strong&gt;: this example may take a long time to run, and &lt;strong&gt;incur significant charges&lt;/strong&gt; in its use of GPUs, depending upon how its parameters are configured.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bikes_weather_keras_tuner'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Model bike rental duration given weather, use Keras Tuner'&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;bikes_weather_hptune&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tune_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;train_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;num_tuners&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bucket_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'YOUR_BUCKET_NAME'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# used for the HP dirs; don't include the 'gs://'
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;tuner_dir_prefix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'hptest'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tuner_proj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'p1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;max_trials&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://YOUR/GCS/PATH'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# for the full training jobs
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'gs://aju-dev-demos-codelabs/bikes_weather/'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# if -1, don't override normal calcs based on dataset size
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;num_best_hps&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# the N best parameter sets for full training
&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# the indices to the best param sets; necessary in addition to the above param because of
&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# how KFP loops work currently.  Must be consistent with the above param.
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;num_best_hps_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'{&quot;root_mean_squared_error&quot;: 2000}'&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;hptune&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ContainerOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'ktune'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'gcr.io/google-samples/ml-pipeline-bikes-dep:b97ee76'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;arguments&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'--epochs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tune_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--num-tuners'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_tuners&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;'--tuner-dir'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuner_dir_prefix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;'--tuner-proj'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuner_proj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--bucket-name'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bucket_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--max-trials'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_trials&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;'--namespace'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'default'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--num-best-hps'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_best_hps&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--executions-per-trial'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;'--deploy'&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;file_outputs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'hps'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'/tmp/hps.json'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# create TensorBoard viz for the parent directory of all training runs, so that we can
&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# compare them.
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tb_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;log_dir_uri&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;working_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ParallelFor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_best_hps_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# start the full training runs in parallel
&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;train&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;train_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;workdir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'log_dir_uri'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;tb_dir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tb_viz&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'log_dir_uri'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;epochs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_epochs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;steps_per_epoch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;hp_idx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hptune_results&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hptune&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'hps'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval_metrics_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;thresholds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'metrics_output_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# conditional serving
&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;serve&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serve_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'train_output_path'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bikesw'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;namespace&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'default'&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;train&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_gpu_limit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you want, you can compile and run this pipeline the same way as was done in the previous section. You can also find this pipeline in the example repo &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/example_pipelines/bw_ktune_metrics.py&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;more-detail-on-the-code-and-requesting-predictions-from-your-model&quot;&gt;More detail on the code, and requesting predictions from your model&lt;/h2&gt;

&lt;p&gt;This example didn’t focus on the details of the pipeline component (step) implementations.  The training component uses a Keras model (TF 2.3). The serving component uses &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/serving&quot;&gt;TF-serving&lt;/a&gt;: once the serving service is up and running, you can send prediction requests to your trained model.&lt;/p&gt;

&lt;p&gt;You can find more detail on these components, and an example of sending a prediction request, &lt;a href=&quot;https://github.com/amygdala/code-snippets/tree/master/ml/kubeflow-pipelines/keras_tuner&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Running a distributed Keras HP Tuning search using Kubeflow Pipelines</title><link href="https://amygdala.github.io/gcp_blog/ml/kfp/kubeflow/keras/tensorflow/hp_tuning/2020/10/19/keras_tuner.html" rel="alternate" type="text/html" title="Running a distributed Keras HP Tuning search using Kubeflow Pipelines" /><published>2020-10-19T00:00:00-05:00</published><updated>2020-10-19T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/kfp/kubeflow/keras/tensorflow/hp_tuning/2020/10/19/keras_tuner</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/kfp/kubeflow/keras/tensorflow/hp_tuning/2020/10/19/keras_tuner.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;The performance of a machine learning model is often crucially dependent on the choice of good &lt;a href=&quot;https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)&quot;&gt;hyperparameters&lt;/a&gt;. For models of any complexity, relying on trial and error to find good values for these parameters does not scale. This tutorial shows how to use &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;&lt;strong&gt;Cloud AI Platform Pipelines&lt;/strong&gt;&lt;/a&gt;  in conjunction with the &lt;a href=&quot;https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html&quot;&gt;&lt;strong&gt;Keras Tuner&lt;/strong&gt;&lt;/a&gt; libraries to build a &lt;strong&gt;hyperparameter-tuning workflow&lt;/strong&gt; that runs a distributed HP search on GKE.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/ai-platform/pipelines/docs&quot;&gt;Cloud AI Platform Pipelines&lt;/a&gt;, currently in Beta, provides a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility, and gives you an easy-to-install, secure execution environment for your ML workflows. AI Platform Pipelines is based on &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/&quot;&gt;Kubeflow Pipelines&lt;/a&gt; (KFP) installed on a &lt;a href=&quot;https://cloud.google.com/kubernetes-engine&quot;&gt;Google Kubernetes Engine (GKE)&lt;/a&gt; cluster, and can run pipelines specified via both the KFP and TFX SDKs. See &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;this blog post&lt;/a&gt; for more detail on the Pipelines tech stack.
You can create an AI Platform Pipelines installation with just a few clicks. After installing, you access AI Platform Pipelines by visiting the AI Platform Panel in the &lt;a href=&quot;https://console.cloud.google.com/&quot;&gt;Cloud Console&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html&quot;&gt;Keras Tuner&lt;/a&gt; is a distributable hyperparameter optimization framework. Keras Tuner makes it easy to define a search space and leverage included algorithms to find the best hyperparameter values. It comes with several search  algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.  It is straightforward to run the Keras Tuner in &lt;a href=&quot;https://keras-team.github.io/keras-tuner/tutorials/distributed-tuning/&quot;&gt;distributed search mode&lt;/a&gt;, which we’ll leverage for this example.&lt;/p&gt;

&lt;p&gt;The intent of a HP tuning search is typically not to do full training for each parameter combination, but to find the best starting points.  The number of epochs run in the HP search trials are typically smaller than that used in the full training. So, an HP tuning-based ML workflow could include these steps:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;perform a distributed HP tuning search, and obtain the results&lt;/li&gt;
  &lt;li&gt;do concurrent full training runs for each of the best &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt; parameter configurations, and export (save) the model for each&lt;/li&gt;
  &lt;li&gt;serve (some of) the resultant models, often after model evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As indicated above, a Cloud AI Platform (KFP) Pipeline runs under the hood on a GKE cluster.  This makes it straightforward to implement this workflow— leveraging GKE for the distributed HP search and model serving— so that you just need to launch a pipeline job to kick it off.&lt;/p&gt;

&lt;p&gt;This post highlights an &lt;a href=&quot;https://github.com/amygdala/code-snippets/tree/master/ml/kubeflow-pipelines/keras_tuner&quot;&gt;example pipeline&lt;/a&gt; that does that. The example also shows how to use &lt;strong&gt;preemptible&lt;/strong&gt; GPU-enabled VMS for the HP search, to reduce costs; and how to use &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/serving&quot;&gt;TF-serving&lt;/a&gt; to deploy the trained model(s) on the same cluster for serving. As part of the process, we’ll see how GKE provides a scalable, resilient platform with easily-configured use of accelerators.&lt;/p&gt;

&lt;h2 id=&quot;about-the-dataset-and-modeling-task&quot;&gt;About the dataset and modeling task&lt;/h2&gt;

&lt;h3 id=&quot;the-dataset&quot;&gt;The dataset&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://cloud.google.com/bigquery/public-data/&quot;&gt;Cloud Public Datasets Program&lt;/a&gt; makes available public datasets that are useful for experimenting with machine learning. Just as with this
“&lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;Explaining model predictions on structured data&lt;/a&gt;” post, we’ll use data that is essentially a join of two public datasets stored in &lt;a href=&quot;https://cloud.google.com/bigquery/&quot;&gt;BigQuery&lt;/a&gt;: &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=london_bicycles&amp;amp;page=dataset&quot;&gt;London Bike rentals&lt;/a&gt; and &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=noaa_gsod&amp;amp;page=dataset&quot;&gt;NOAA weather data&lt;/a&gt;, with some additional processing to clean up outliers and derive additional GIS and day-of-week fields.&lt;/p&gt;

&lt;h3 id=&quot;the-modeling-task-and-keras-model&quot;&gt;The modeling task and Keras model&lt;/h3&gt;

&lt;p&gt;We’ll use this dataset to build a &lt;a href=&quot;https://keras.io/&quot;&gt;Keras&lt;/a&gt; &lt;em&gt;regression model&lt;/em&gt; to predict the &lt;strong&gt;duration&lt;/strong&gt; of a bike rental based on information about the start and end stations, the day of the week, the weather on that day, and other data. If we were running a bike rental company, for example, these predictions—and their explanations—could help us anticipate demand and even plan how to stock each location.&lt;/p&gt;

&lt;p&gt;We’ll build a parameterizable model architecture for this task (similar in structure to a &lt;a href=&quot;https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html&quot;&gt;“wide and deep”&lt;/a&gt; model), then use the Keras Tuner package to do an HP search using this model structure, as well as doing full model training using the best HP set(s).&lt;/p&gt;

&lt;h2 id=&quot;keras-tuner-in-distributed-mode-on-gke-with-preemptible-vms&quot;&gt;Keras tuner in distributed mode on GKE with preemptible VMs&lt;/h2&gt;

&lt;p&gt;With the Keras Tuner, you set up a HP tuning search along these lines (the code is from the &lt;a href=&quot;https://github.com/amygdala/code-snippets/tree/master/ml/kubeflow-pipelines/keras_tuner&quot;&gt;example&lt;/a&gt;; other search algorithms are supported in addition to ‘random’):&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;tuner&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RandomSearch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;objective&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'val_mae'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;max_trials&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_trials&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;distribution_strategy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRATEGY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;executions_per_trial&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;executions_per_trial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;directory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuner_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuner_proj&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;…where in the above, the &lt;code class=&quot;highlighter-rouge&quot;&gt;create_model&lt;/code&gt; call takes takes an argument &lt;code class=&quot;highlighter-rouge&quot;&gt;hp&lt;/code&gt; from which you can sample hyperparameters.  For this example, we’re varying number of hidden layers, number of nodes per hidden layer, and learning rate in the HP search. There are many other hyperparameters that you might also want to vary in your search.&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sparse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;real&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bwmodel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_layers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bwmodel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wide_and_deep_classifier&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;linear_feature_columns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sparse&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;dnn_feature_columns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;real&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;num_hidden_layers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'num_hidden_layers'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;dnn_hidden_units1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'hidden_size'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;step&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;learning_rate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'learning_rate'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, call &lt;code class=&quot;highlighter-rouge&quot;&gt;tuner.search(...)&lt;/code&gt;.  See the Keras Tuner docs for more.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://keras-team.github.io/keras-tuner/&quot;&gt;Keras Tuner&lt;/a&gt; supports running this search in &lt;a href=&quot;https://keras-team.github.io/keras-tuner/tutorials/distributed-tuning/&quot;&gt;&lt;strong&gt;distributed mode&lt;/strong&gt;&lt;/a&gt;.
&lt;a href=&quot;https://cloud.google.com/kubernetes-engine&quot;&gt;Google Kubernetes Engine (GKE)&lt;/a&gt; makes it straightforward to configure and run a distributed HP tuning search.  GKE is  a good fit not only because it lets you easily distribute the HP tuning workload, but because you can leverage autoscaling to boost node pools for a large job, then scale down when the resources are no longer needed.  It’s also easy to deploy trained models for serving onto the same GKE cluster, using &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/serving&quot;&gt;TF-serving&lt;/a&gt;.  In addition, the Keras Tuner works well with &lt;a href=&quot;https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms&quot;&gt;&lt;strong&gt;preemptible VMs&lt;/strong&gt;&lt;/a&gt;, making it even cheaper to run your workloads.&lt;/p&gt;

&lt;p&gt;With the Keras Tuner’s distributed config, you specify one node as the ‘chief’, which coordinates the search, and ‘tuner’ nodes that do the actual work of running model training jobs using a given param set (the &lt;em&gt;trials&lt;/em&gt;).  When you set up an HP search, you indicate the max number of trials to run, and how many &lt;em&gt;executions&lt;/em&gt; to run per trial. The Kubeflow Pipeline allows dynamic specification of the number of tuners to use for a given HP search— this determines how many trials you can run concurrently— as well as the max number of trials and number of executions.&lt;/p&gt;

&lt;p&gt;We’ll define the tuner components as Kubernetes &lt;a href=&quot;https://kubernetes.io/docs/concepts/workloads/controllers/job/&quot;&gt;&lt;em&gt;jobs&lt;/em&gt;&lt;/a&gt;, each specified to have  1 &lt;em&gt;replica&lt;/em&gt;.   This means that if a tuner job pod is terminated for some reason prior to job completion, Kubernetes will start up another replica.
Thus, the Keras Tuner’s HP search is a good fit for use of &lt;a href=&quot;https://cloud.google.com/preemptible-vms&quot;&gt;preemptible VMs&lt;/a&gt;.  Because the HP search bookkeeping— orchestrated by the tuner &lt;code class=&quot;highlighter-rouge&quot;&gt;chief&lt;/code&gt;, via an ‘oracle’ file— tracks the state of the trials,  the configuration is robust to a tuner pod terminating unexpectedly— say, due to a preemption— and a new one being restarted.  The new job pod will get its instructions from the ‘oracle’ and continue running &lt;em&gt;trials&lt;/em&gt;.
The example uses GCS for the tuners’ shared file system.&lt;/p&gt;

&lt;p&gt;Once the HP search has finished, any of the tuners can obtain information on the &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt; best parameter sets (as well as export the best model(s)).&lt;/p&gt;

&lt;h2 id=&quot;defining-the-hp-tuning--training-workflow-as-a-pipeline&quot;&gt;Defining the HP Tuning + training workflow as a pipeline&lt;/h2&gt;

&lt;p&gt;The definition of the pipeline itself is &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/keras_tuner2/ml/kubeflow-pipelines/keras_tuner/example_pipelines/bw_ktune.py&quot;&gt;here&lt;/a&gt;, specified using the KFP SDK.  It’s then compiled to an archive file and uploaded to AI Platforms Pipelines. (To compile it yourself, you’ll need to have the &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/#install-the-kubeflow-pipelines-sdk&quot;&gt;KFP SDK installed&lt;/a&gt;).  Pipeline steps are container-based, and you can find the Dockerfiles and underlying code for the steps under the example’s &lt;a href=&quot;https://github.com/amygdala/code-snippets/tree/keras_tuner2/ml/kubeflow-pipelines/keras_tuner/components&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;components&lt;/code&gt;&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;The example pipeline first runs a distributed HP tuning search using a specified number of tuner workers,  then obtains the best &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt; parameter sets—by default, it grabs the best two.  The pipeline step itself does not do the heavy lifting, but rather launches all the tuner &lt;a href=&quot;https://cloud.google.com/kubernetes-engine/docs/how-to/jobs&quot;&gt;&lt;em&gt;jobs&lt;/em&gt;&lt;/a&gt; on GKE, which run concurrently, and monitors for their completion. (Unsurprisingly, this stage of the pipeline may run for quite a long time, depending upon how many HP search trials were specified and how many tuners are used for the distributed search).&lt;/p&gt;

&lt;p&gt;Concurrently to the Keras Tuner runs, the pipeline sets up a &lt;a href=&quot;https://github.com/kubeflow/pipelines/blob/master/components/tensorflow/tensorboard/prepare_tensorboard/component.yaml&quot;&gt;TensorBoard visualization component&lt;/a&gt;, its log directory set to the GCS path under which the full training jobs are run.&lt;/p&gt;

&lt;p&gt;The pipeline then runs full training jobs, concurrently, for each of the &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt; best parameter sets.  It does this via the KFP &lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/samples/core/loop_parameter&quot;&gt;loop&lt;/a&gt; construct, allowing the pipeline to support dynamic specification of &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt;.  The training jobs can be monitored and compared using TensorBoard, both while they’re running and after they’ve completed.&lt;/p&gt;

&lt;p&gt;Then, the trained models are deployed for serving for serving on the GKE cluster, using &lt;a href=&quot;https://www.tensorflow.org/tfx/guide/serving&quot;&gt;TF-serving&lt;/a&gt;.  Each deployed model has its own cluster service endpoint.
(While not included in this example, one could insert a step for model evaluation before making the decision about whether to deploy to TF-serving.)&lt;/p&gt;

&lt;p&gt;For example, here is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Directed_acyclic_graph&quot;&gt;DAG&lt;/a&gt; for a pipeline execution that did training and then deployed prediction services for the two best parameter configurations.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/pl_dag.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kf-pls/pl_dag.png&quot; width=&quot;80%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;The DAG for keras tuner pipeline execution.  Here the two best parameter configurations were used for full training.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;running-the-example-pipeline&quot;&gt;Running the example pipeline&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: this example may take a long time to run, and &lt;strong&gt;incur significant charges&lt;/strong&gt; in its use of GPUs, depending upon how its parameters are configured.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To run the example yourself, and for more detail on the KFP pipeline’s components, see the example’s &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/kubeflow-pipelines/keras_tuner/README.md&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;README&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h2&gt;

&lt;p&gt;One obvious next step in development of the workflow would be to add components that evaluate each full model after training, before determining whether to deploy it. One approach could be to use
&lt;a href=&quot;https://www.tensorflow.org/tfx/model_analysis/get_started&quot;&gt;TensorFlow Model Analysis&lt;/a&gt;  (TFMA).  Stay tuned for some follow-on posts that explore how to do that using KFP.
(Update: one such post is &lt;a href=&quot;https://amygdala.github.io/gcp_blog/ml/kfp/mlops/keras/hp_tuning/2020/10/26/metrics_eval_component.html&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;While not covered in this example, it would alternatively have been straightforward to deploy and serve the trained model(s) using &lt;a href=&quot;https://cloud.google.com/ai-platform/prediction/docs&quot;&gt;AI Platform Prediction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Another alternative would be to use &lt;a href=&quot;https://cloud.google.com/ai-platform/optimizer/docs/overview&quot;&gt;AI Platform Vizier&lt;/a&gt; for hyperparameter tuning search instead of the Keras Tuner libraries. Stay tuned for a post on that as well.&lt;/p&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Training an AutoML Tables model with BigQuery ML</title><link href="https://amygdala.github.io/gcp_blog/ml/automl/bigquery/2020/07/14/bqml_tables.html" rel="alternate" type="text/html" title="Training an AutoML Tables model with BigQuery ML" /><published>2020-07-14T00:00:00-05:00</published><updated>2020-07-14T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/automl/bigquery/2020/07/14/bqml_tables</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/automl/bigquery/2020/07/14/bqml_tables.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/bigquery-ml/docs&quot;&gt;BigQuery ML&lt;/a&gt; (BQML) enables users to create and execute machine learning models in &lt;a href=&quot;https://cloud.google.com/bigquery/docs&quot;&gt;BigQuery&lt;/a&gt; by using SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/automl-tables/docs/&quot;&gt;AutoML Tables&lt;/a&gt; lets you automatically build, analyze, and deploy state-of-the-art machine learning models using your own structured data, and &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;explain prediction results&lt;/a&gt;. It’s useful for a wide range of machine learning tasks, such as asset valuations, fraud detection, credit risk analysis, customer retention prediction, analyzing item layouts in stores, &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/how-kaggle-solved-a-spam-problem-using-automl&quot;&gt;solving comment section spam problems&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/using-ai-to-scale-audio-content-categorization&quot;&gt;quickly categorizing audio content&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;predicting rental demand&lt;/a&gt;, and more.
(&lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/new-automl-features-and-end-to-end-workflows-on-ai-platform-pipelines&quot;&gt;This blog post&lt;/a&gt; gives more detail on many of its capabilities).&lt;/p&gt;

&lt;p&gt;Recently, BQML added &lt;a href=&quot;https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-automl#limitations&quot;&gt;support for AutoML Tables models&lt;/a&gt; (in Beta).  This makes it easy to train Tables models on your BigQuery data using standard SQL, directly from the BigQuery UI (or API), and to evaluate and use the models for prediction directly via SQL as well.&lt;/p&gt;

&lt;p&gt;In this post, we’ll take a look at how to do this, and show a few tips as well.&lt;/p&gt;

&lt;h2 id=&quot;about-the-dataset-and-modeling-task&quot;&gt;About the dataset and modeling task&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://cloud.google.com/bigquery/public-data/&quot;&gt;Cloud Public Datasets Program&lt;/a&gt; makes available public datasets that are useful for experimenting with machine learning. We’ll use data that is essentially a join of two public datasets stored in &lt;a href=&quot;https://cloud.google.com/bigquery/&quot;&gt;BigQuery&lt;/a&gt;: &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=london_bicycles&amp;amp;page=dataset&amp;amp;_ga=2.10177653.-1341725502.1591817317&quot;&gt;London Bike rentals&lt;/a&gt; and &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=noaa_gsod&amp;amp;page=dataset&amp;amp;_ga=2.208160978.-1341725502.1591817317&quot;&gt;NOAA weather data&lt;/a&gt;, with some additional processing to clean up outliers and derive additional GIS and day-of-week fields.  The table we’ll use is here: &lt;code class=&quot;highlighter-rouge&quot;&gt;aju-dev-demos.london_bikes_weather.bikes_weather&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Using this dataset, we’ll build a &lt;em&gt;regression&lt;/em&gt; model to predict the &lt;code class=&quot;highlighter-rouge&quot;&gt;duration&lt;/code&gt; of a bike rental based on information about the start and end rental stations, the day of the week, the weather on that day, and other data. (If we were running a bike rental company, we could use these predictions—and their explanations—to help us anticipate demand and even plan how to stock each location).&lt;/p&gt;

&lt;h2 id=&quot;specifying-training-eval-and-test-datasets&quot;&gt;Specifying training, eval, and test datasets&lt;/h2&gt;

&lt;p&gt;AutoML Tables will split the data you send it into its own training/test/validation sets.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: it’s also possible to specify the data split column as a BQML model creation option: &lt;code class=&quot;highlighter-rouge&quot;&gt;DATA_SPLIT_COL = string_value&lt;/code&gt;.
&lt;code class=&quot;highlighter-rouge&quot;&gt;string_value&lt;/code&gt; is one of the columns in the training data and should be either a timestamp or string column. See the &lt;a href=&quot;https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-automl#create_model_syntax&quot;&gt;documentation&lt;/a&gt; for more detail.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For BQML, we’ll split the data into a set to use for the training process, and reserve a ‘test’ dataset that AutoML training never sees. We don’t want to just grab a sequential slice of the table for each. There’s an easy way to accomplish this in a repeatable manner by using the &lt;a href=&quot;https://github.com/google/farmhash&quot;&gt;Farm Hash algorithm&lt;/a&gt;, implemented as the &lt;code class=&quot;highlighter-rouge&quot;&gt;FARM_FINGERPRINT&lt;/code&gt; BigQuery SQL function.  We’d like to create a 90/10 split.&lt;/p&gt;

&lt;p&gt;So, the query to generate training data will include a clause like this:&lt;/p&gt;
&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Similarly, the query to generate the ‘test’ set will use this clause:&lt;/p&gt;
&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Using this approach, we can build &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clauses with reproducible results that give us datasets with the split proportions we want.&lt;/p&gt;

&lt;h2 id=&quot;tables-schema-configuration-and-bqml&quot;&gt;Tables schema configuration and BQML&lt;/h2&gt;

&lt;p&gt;If you’ve used AutoML Tables, you may have noticed that after a dataset is ingested, it’s possible to adjust the inferred field (column) schema information. For example, you might have some fields with numeric values that you’d like to treat as &lt;em&gt;categorical&lt;/em&gt; when you train your custom model.  This is the case for our dataset, where we’d like to treat the numeric rental station IDs as categorical.&lt;/p&gt;

&lt;p&gt;With BQML, it’s not currently possible to explicitly specify the schema for the model inputs, but for numerics that we want to treat as categorical, we can provide a strong hint by casting them to strings; then Tables will decide whether to treat such values as ‘text’ or ‘categorical’.  So, in the SQL below, you’ll see that the &lt;code class=&quot;highlighter-rouge&quot;&gt;day_of_week&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;start_station_id&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;end_station_id&lt;/code&gt; columns are all cast to strings.&lt;/p&gt;

&lt;h2 id=&quot;training-the-automl-tables-model-via-bqml&quot;&gt;Training the AutoML Tables model via BQML&lt;/h2&gt;

&lt;p&gt;To train the Tables model, we’ll pick the &lt;code class=&quot;highlighter-rouge&quot;&gt;AUTOML_REGRESSOR&lt;/code&gt; model type (since we want to predict &lt;code class=&quot;highlighter-rouge&quot;&gt;duration&lt;/code&gt;, a numeric value.  In the query, we’ll specify &lt;code class=&quot;highlighter-rouge&quot;&gt;duration&lt;/code&gt; in the &lt;code class=&quot;highlighter-rouge&quot;&gt;input_label_cols&lt;/code&gt; list (thus indicating the “label” column”, and set &lt;code class=&quot;highlighter-rouge&quot;&gt;budget_hours&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;1&lt;/code&gt;, meaning that we’re budgeting one hour of training time. (The training process, which includes setup and teardown, etc., will typically take longer).&lt;/p&gt;

&lt;p&gt;Here’s the resultant BigQuery query (to run it yourself, substitute your project id, dataset, and model name in the first line, then paste the query into the BigQuery UI query window in the &lt;a href=&quot;https://console.cloud.google.com/bigquery&quot;&gt;Cloud Console&lt;/a&gt;):&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;REPLACE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`YOUR-PROJECT-ID.YOUR-DATASET.YOUR-MODEL-NAME`&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;OPTIONS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'AUTOML_REGRESSOR'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_label_cols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'duration'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;budget_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;day_of_week&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;end_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;euclidean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loc_cross&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prcp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dewp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ssid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;esid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`aju-dev-demos.london_bikes_weather.bikes_weather`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note the casts to &lt;code class=&quot;highlighter-rouge&quot;&gt;STRING&lt;/code&gt; and the use of &lt;code class=&quot;highlighter-rouge&quot;&gt;FARM_FINGERPRINT&lt;/code&gt; as discussed above.&lt;/p&gt;

&lt;p&gt;(If you’ve taken a look at the &lt;code class=&quot;highlighter-rouge&quot;&gt;bikes_weather&lt;/code&gt; table, you might notice that the &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause does not include the &lt;code class=&quot;highlighter-rouge&quot;&gt;bike_id&lt;/code&gt; column. A previously-run AutoML Tables analysis of the &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models&quot;&gt;global feature importance of the dataset fields&lt;/a&gt; indicated that &lt;code class=&quot;highlighter-rouge&quot;&gt;bike_id&lt;/code&gt; had negligible impact, so we won’t use it for this model).&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/global_feature_impt.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/global_feature_impt.png&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Global feature importance rankings.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;evaluating-your-trained-custom-model&quot;&gt;Evaluating your trained custom model&lt;/h2&gt;

&lt;p&gt;After the training has completed, you can view the evaluation metrics for your custom model, and also run your own evaluation query yourself.  You can view the evaluation metrics generated during the training process by clicking on the model name in the BigQuery UI, then click on the “Evaluation” tab in the central panel.  It will look something like this:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_training_eval.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_training_eval.png&quot; width=&quot;40%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Model evaluation metrics generated during the training process&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;(At time of writing, some of this data is incomplete, but that will change soon).&lt;/p&gt;

&lt;p&gt;The BigQuery SQL to run your own evaluation query for the trained model looks like this (again, substitute your own project, dataset, and model name):&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ML&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EVALUATE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MODEL&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`YOUR-PROJECT-ID.YOUR-DATASET.YOUR-MODEL-NAME`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;day_of_week&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;end_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;euclidean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loc_cross&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prcp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dewp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ssid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;esid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`aju-dev-demos.london_bikes_weather.bikes_weather`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that via the &lt;code class=&quot;highlighter-rouge&quot;&gt;FARM_FINGERPRINT&lt;/code&gt; function, we’re using a different dataset for evaluation than we used for training.
The evaluation results should look something like the following.  The metrics will be a bit different from those above, since we’re using different data than AutoML Tables used for its eval split.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_model_eval.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_model_eval.png&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Model evaluation results.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;did-our-schema-hints-help&quot;&gt;Did our schema hints help?&lt;/h3&gt;

&lt;p&gt;It’s interesting to check whether the schema hints (casting some of the numeric fields to strings) made a difference in model accuracy.
To try this yourself, create another differently-named model as shown in the training section above, but for the &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; clause, don’t include the casts to &lt;code class=&quot;highlighter-rouge&quot;&gt;STRING&lt;/code&gt;, e.g.:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;day_of_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;end_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;euclidean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loc_cross&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prcp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dewp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_station_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_station_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, evaluate this second model (again substituting the details for your own project):&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ML&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EVALUATE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MODEL&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`YOUR-PROJECT-ID.YOUR-DATASET.YOUR-MODEL-NAME2`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;day_of_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;end_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;euclidean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loc_cross&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prcp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dewp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_station_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_station_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`aju-dev-demos.london_bikes_weather.bikes_weather`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When I evaluated this second model, which kept the station IDs and day of week as numerics, the results showed that this model was somewhat less accurate:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_model_2_eval.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_model_2_eval.png&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Evaluation results for the model that did not cast categorical numerics to strings.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;using-your-bqml-automl-tables-model-for-prediction&quot;&gt;Using your BQML AutoML Tables model for prediction&lt;/h2&gt;

&lt;p&gt;Once your model is trained, and you’ve ascertained it’s accurate enough, you can use it for prediction.
Here’s an example of how to do that.  Via the &lt;code class=&quot;highlighter-rouge&quot;&gt;FARM_FINGERPRINT&lt;/code&gt; function, we’re drawing from our “test” split, but because the resultant dataset is large, we’re just grabbing a few rows (200) for the query below:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ML&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PREDICT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MODEL&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`YOUR-PROJECT-ID.YOUR-DATASET.YOUR-MODEL-NAME`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;err&quot;&gt; &lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;day_of_week&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;end_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;euclidean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loc_cross&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prcp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dewp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ssid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end_station_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;esid&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;`aju-dev-demos.london_bikes_weather.bikes_weather`&lt;/span&gt; 
 &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ABS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;MOD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FARM_FINGERPRINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;cast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;limit&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The prediction results will look something like this (click to see larger version):&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_prediction_results.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/amygdala/gcp_blog/master/images/bqml_prediction_results.png&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Model prediction results.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Note that while the query above is “standalone”, you can of course access model prediction results as part of a larger BigQuery query too.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;In this post, we showed how to train an AutoML Tables model using BQML, evaluate the model, and then use it for prediction— all from BigQuery.
The &lt;a href=&quot;https://cloud.google.com/bigquery-ml/docs&quot;&gt;BQML documentation&lt;/a&gt; has more detail on getting started, and resources for the other model types available through BQML.&lt;/p&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Using Google Cloud Functions to support event-based triggering of Cloud AI Platform Pipelines</title><link href="https://amygdala.github.io/gcp_blog/ml/pipelines/mlops/kfp/gcf/2020/05/12/hosted_kfp_gcf.html" rel="alternate" type="text/html" title="Using Google Cloud Functions to support event-based triggering of Cloud AI Platform Pipelines" /><published>2020-05-12T00:00:00-05:00</published><updated>2020-05-12T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/pipelines/mlops/kfp/gcf/2020/05/12/hosted_kfp_gcf</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/pipelines/mlops/kfp/gcf/2020/05/12/hosted_kfp_gcf.html">&lt;!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: _notebooks/2020-05-12-hosted_kfp_gcf.ipynb
--&gt;

&lt;div class=&quot;container&quot; id=&quot;notebook-container&quot;&gt;
        
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;This example shows how you can run a &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;Cloud AI Platform Pipeline&lt;/a&gt; from a &lt;a href=&quot;https://cloud.google.com/functions/docs/&quot;&gt;Google Cloud Function&lt;/a&gt;, thus providing a way for Pipeline runs to be triggered by events (in the interim before this is supported by Pipelines itself).&lt;/p&gt;
&lt;p&gt;In this example, the function is triggered by the addition of or update to a file in a &lt;a href=&quot;https://cloud.google.com/storage/&quot;&gt;Google Cloud Storage&lt;/a&gt; (GCS) bucket, but Cloud Functions can have other triggers too (including &lt;a href=&quot;https://cloud.google.com/pubsub/docs/&quot;&gt;Pub/Sub&lt;/a&gt;-based triggers).&lt;/p&gt;
&lt;p&gt;The example is Google Cloud Platform (GCP)-specific, and requires a &lt;a href=&quot;https://cloud.google.com/ai-platform/pipelines/docs&quot;&gt;Cloud AI Platform Pipelines&lt;/a&gt; installation using Pipelines version &amp;gt;= 0.4.  To run this example as a notebook, click on one of the badges at the top of the page or see &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/notebook_examples/functions/hosted_kfp_gcf.ipynb&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(If you are instead interested in how to do this with a Kubeflow-based pipelines installation, see &lt;a href=&quot;https://github.com/amygdala/kubeflow-examples/blob/cookbook/cookbook/pipelines/notebooks/gcf_kfp_trigger.ipynb&quot;&gt;this notebook&lt;/a&gt;).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Setup&quot;&gt;Setup&lt;a class=&quot;anchor-link&quot; href=&quot;#Setup&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;h3 id=&quot;Create-a-Cloud-AI-Platform-Pipelines-installation&quot;&gt;Create a Cloud AI Platform Pipelines installation&lt;a class=&quot;anchor-link&quot; href=&quot;#Create-a-Cloud-AI-Platform-Pipelines-installation&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Follow the instructions in the &lt;a href=&quot;https://cloud.google.com/ai-platform/pipelines/docs&quot;&gt;documentation&lt;/a&gt; to create a Cloud AI Platform Pipelines installation.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Identify-(or-create)-a-Cloud-Storage-bucket-to-use-for-the-example&quot;&gt;Identify (or create) a Cloud Storage bucket to use for the example&lt;a class=&quot;anchor-link&quot; href=&quot;#Identify-(or-create)-a-Cloud-Storage-bucket-to-use-for-the-example&quot;&gt; &lt;/a&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;&lt;strong&gt;Before executing the next cell&lt;/strong&gt;, edit it to &lt;strong&gt;set the &lt;code&gt;TRIGGER_BUCKET&lt;/code&gt; environment variable&lt;/strong&gt; to a Google Cloud Storage bucket (&lt;a href=&quot;https://console.cloud.google.com/storage/browser&quot;&gt;create a bucket first&lt;/a&gt; if necessary). Do &lt;em&gt;not&lt;/em&gt; include the &lt;code&gt;gs://&lt;/code&gt; prefix in the bucket name.&lt;/p&gt;
&lt;p&gt;We'll deploy the GCF function so that it will trigger on new and updated files (blobs) in this bucket.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;env&lt;/span&gt; TRIGGER_BUCKET=REPLACE_WITH_YOUR_GCS_BUCKET_NAME
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Give-Cloud-Function's-service-account-the-necessary-access&quot;&gt;Give Cloud Function's service account the necessary access&lt;a class=&quot;anchor-link&quot; href=&quot;#Give-Cloud-Function's-service-account-the-necessary-access&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;First, make sure the Cloud Function API &lt;a href=&quot;https://console.cloud.google.com/apis/library/cloudfunctions.googleapis.com?q=functions&quot;&gt;is enabled&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Cloud Functions uses the project's 'appspot' acccount for its service account.  It will have the form: 
&lt;code&gt;PROJECT_ID@appspot.gserviceaccount.com&lt;/code&gt;. (This is also the project's App Engine service account).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go to your project's &lt;a href=&quot;https://console.cloud.google.com/iam-admin/serviceaccounts&quot;&gt;IAM - Service Account page&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Find the &lt;code&gt;PROJECT_ID@appspot.gserviceaccount.com&lt;/code&gt; account and copy its email address.&lt;/li&gt;
&lt;li&gt;Find the project's Compute Engine (GCE) default service account (this is the default account used for the Pipelines installation).  It will have a form like this: &lt;code&gt;PROJECT_NUMBER@developer.gserviceaccount.com&lt;/code&gt;.
Click the checkbox next to the GCE service account, and in the 'INFO PANEL' to the right, click &lt;strong&gt;ADD MEMBER&lt;/strong&gt;. Add the Functions service account (&lt;code&gt;PROJECT_ID@appspot.gserviceaccount.com&lt;/code&gt;) as a &lt;strong&gt;Project Viewer&lt;/strong&gt; of the GCE service account. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup1.png&quot; alt=&quot;Add the Functions service account as a project viewer of the GCE service account&quot; /&gt;&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Next, configure your &lt;code&gt;TRIGGER_BUCKET&lt;/code&gt; to allow the Functions service account access to that bucket.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Navigate in the console to your list of buckets in the &lt;a href=&quot;https://console.cloud.google.com/storage/browser&quot;&gt;Storage Browser&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Click the checkbox next to the &lt;code&gt;TRIGGER_BUCKET&lt;/code&gt;.  In the 'INFO PANEL' to the right, click &lt;strong&gt;ADD MEMBER&lt;/strong&gt;.  Add the service account (&lt;code&gt;PROJECT_ID@appspot.gserviceaccount.com&lt;/code&gt;) with &lt;code&gt;Storage Object Admin&lt;/code&gt; permissions. (While not tested, giving both Object view and create permissions should also suffice).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup2.png&quot; alt=&quot;add the app engine service account to the trigger bucket with view and edit permissions&quot; /&gt;&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Create-a-simple-GCF-function-to-test-your-configuration&quot;&gt;Create a simple GCF function to test your configuration&lt;a class=&quot;anchor-link&quot; href=&quot;#Create-a-simple-GCF-function-to-test-your-configuration&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;First we'll generate and deploy a simple GCF function, to test that the basics are properly configured.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;%%bash
mkdir -p functions
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;We'll first create a &lt;code&gt;requirements.txt&lt;/code&gt; file, to indicate what packages the GCF code requires to be installed. (We won't actually need &lt;code&gt;kfp&lt;/code&gt; for this first 'sanity check' version of a GCF function, but we'll need it below for the second  function we'll create, that deploys a pipeline).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%%writefile&lt;/span&gt; functions/requirements.txt
&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Next, we'll create a simple GCF function in the &lt;code&gt;functions/main.py&lt;/code&gt; file:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%%writefile&lt;/span&gt; functions/main.py
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;gcs_test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Background Cloud Function to be triggered by Cloud Storage.&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;     This generic function logs relevant data when a file is changed.&lt;/span&gt;

&lt;span class=&quot;sd&quot;&gt;  Args:&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;      data (dict): The Cloud Functions event payload.&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;      context (google.cloud.functions.Context): Metadata of triggering event.&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;  Returns:&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;      None; the output is written to Stackdriver Logging&lt;/span&gt;
&lt;span class=&quot;sd&quot;&gt;  &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Event ID: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Event type: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Data: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Bucket: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;bucket&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;File: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;file_uri&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;gs://&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;bucket&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Using file uri: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file_uri&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Metageneration: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;metageneration&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Created: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;timeCreated&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Updated: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;updated&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Deploy the GCF function as follows. (You'll need to &lt;strong&gt;wait a moment or two for output of the deployment to display in the notebook&lt;/strong&gt;).  You can also run this command from a notebook terminal window in the &lt;code&gt;functions&lt;/code&gt; subdirectory.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;%%bash
&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; functions
gcloud functions deploy gcs_test --runtime python37 --trigger-resource &lt;span class=&quot;si&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;TRIGGER_BUCKET&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt; --trigger-event google.storage.object.finalize
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;After you've deployed, test your deployment by adding a file to the specified &lt;code&gt;TRIGGER_BUCKET&lt;/code&gt;. You can do this easily by visiting the &lt;strong&gt;Storage&lt;/strong&gt; panel in the Cloud Console, clicking on the bucket in the list, and then clicking on &lt;strong&gt;Upload files&lt;/strong&gt; in the bucket details view.&lt;/p&gt;
&lt;p&gt;Then, check in the logs viewer panel (&lt;a href=&quot;https://console.cloud.google.com/logs/viewer&quot;&gt;https://console.cloud.google.com/logs/viewer&lt;/a&gt;) to confirm that the GCF function was triggered and ran correctly.  You can select 'Cloud Function' in the first pulldown menu to filter on just those log entries.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Deploy-a-Pipeline-from-a-GCF-function&quot;&gt;Deploy a Pipeline from a GCF function&lt;a class=&quot;anchor-link&quot; href=&quot;#Deploy-a-Pipeline-from-a-GCF-function&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Next, we'll create a GCF function that deploys an AI Platform Pipeline when triggered. First, preserve your existing main.py in a backup file:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;%%bash
&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; functions
mv main.py main.py.bak
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Then, &lt;strong&gt;before executing the next cell&lt;/strong&gt;, &lt;strong&gt;edit the &lt;code&gt;HOST&lt;/code&gt; variable&lt;/strong&gt; in the code below.  You'll replace &lt;code&gt;&amp;lt;your_endpoint&amp;gt;&lt;/code&gt; with the correct value for your installation.&lt;/p&gt;
&lt;p&gt;To find this URL, visit the &lt;a href=&quot;https://console.cloud.google.com/ai-platform/pipelines/&quot;&gt;Pipelines panel&lt;/a&gt; in the Cloud Console.&lt;br /&gt;
From here, you can find the URL by clicking on the &lt;strong&gt;SETTINGS&lt;/strong&gt; link for the Pipelines installation you want to use, and copying the 'host' string displayed in the client example code (prepend &lt;code&gt;https://&lt;/code&gt; to that string in the code below).&lt;br /&gt;
You can alternately click on &lt;strong&gt;OPEN PIPELINES DASHBOARD&lt;/strong&gt; for the Pipelines installation, and copy that URL, removing the &lt;code&gt;/#/pipelines&lt;/code&gt; suffix.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%%writefile&lt;/span&gt; functions/main.py
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;datetime&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;logging&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;time&lt;/span&gt;
 
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.compiler&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;compiler&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp.dsl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;dsl&lt;/span&gt;
 
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;requests&lt;/span&gt;
 
&lt;span class=&quot;c1&quot;&gt;# TODO: replace with your Pipelines endpoint URL&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;HOST&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;https://&amp;lt;your_endpoint&amp;gt;.pipelines.googleusercontent.com&amp;#39;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Sequential&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;A pipeline with two sequential steps.&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;sequential_pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;gs://ml-pipeline-playground/shakespeare1.txt&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;A pipeline with two sequential steps.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;op1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ContainerOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;filechange&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;library/bash:4.4.23&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;arguments&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;echo &amp;quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;quot; &amp;gt; /tmp/results.txt&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;file_outputs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;newfile&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;/tmp/results.txt&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;op2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ContainerOp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;echo&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;library/bash:4.4.23&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;arguments&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;echo &amp;quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;quot;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;newfile&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_access_token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token&amp;#39;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Metadata-Flavor&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;Google&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raise_for_status&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;access_token&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;access_token&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;access_token&lt;/span&gt;
 
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hosted_kfp_test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Event ID: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Event type: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Data: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Bucket: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;bucket&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;File: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;file_uri&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;gs://&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;bucket&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Using file uri: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file_uri&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Metageneration: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;metageneration&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Created: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;timeCreated&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;Updated: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;updated&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
  
  &lt;span class=&quot;n&quot;&gt;token&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_access_token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;attempting to launch pipeline run.&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;utcnow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HOST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;existing_token&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;compiler&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Compiler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sequential_pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;/tmp/sequential.tar.gz&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_experiment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;gcstriggered&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# this is a &amp;#39;get or create&amp;#39; op&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run_pipeline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;sequential_&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;/tmp/sequential.tar.gz&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                              &lt;span class=&quot;n&quot;&gt;params&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;filename&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;file_uri&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;logging&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Next, deploy the new GCF function. As before, &lt;strong&gt;it will take a moment or two for the results of the deployment to display in the notebook&lt;/strong&gt;.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;%%bash
&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; functions
gcloud functions deploy hosted_kfp_test --runtime python37 --trigger-resource &lt;span class=&quot;si&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;TRIGGER_BUCKET&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt; --trigger-event google.storage.object.finalize
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Add another file to your &lt;code&gt;TRIGGER_BUCKET&lt;/code&gt;. This time you should see both GCF functions triggered. The &lt;code&gt;hosted_kfp_test&lt;/code&gt; function will deploy the pipeline. You'll be able to see it running at your Pipeline installation's endpoint, &lt;code&gt;https://&amp;lt;your_endpoint&amp;gt;.pipelines.googleusercontent.com/#/pipelines&lt;/code&gt;, under the given Pipelines Experiment (&lt;code&gt;gcstriggered&lt;/code&gt; as default).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;hr /&gt;
&lt;p&gt;Copyright 2020, Google, LLC.
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.apache.org/licenses/LICENSE-2.0&quot;&gt;http://www.apache.org/licenses/LICENSE-2.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html"></summary></entry><entry><title type="html">Creating an AutoML Tables end-to-end workflow on Cloud AI Platform Pipelines</title><link href="https://amygdala.github.io/gcp_blog/ml/kfp/automl/2020/04/22/automltables_kfp_e2e.html" rel="alternate" type="text/html" title="Creating an AutoML Tables end-to-end workflow on Cloud AI Platform Pipelines" /><published>2020-04-22T00:00:00-05:00</published><updated>2020-04-22T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/kfp/automl/2020/04/22/automltables_kfp_e2e</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/kfp/automl/2020/04/22/automltables_kfp_e2e.html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/automl-tables/docs/&quot;&gt;AutoML Tables&lt;/a&gt; lets you automatically build, analyze, and deploy state-of-the-art machine learning models using your own structured data.&lt;/p&gt;

&lt;p&gt;A number of new AutoML Tables features have been released recently. These include:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;An improved &lt;a href=&quot;https://googleapis.dev/python/automl/latest/gapic/v1beta1/tables.html&quot;&gt;Python client library&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;The ability to obtain &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;explanations&lt;/a&gt; for your online predictions&lt;/li&gt;
  &lt;li&gt;The ability to &lt;a href=&quot;http://amygdala.github.io/automl/ml/2019/12/05/automl_tables_export.html&quot;&gt;export your model and serve it in a container&lt;/a&gt; anywhere&lt;/li&gt;
  &lt;li&gt;The ability to view model search progress and final model hyperparameters &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/logging&quot;&gt;in Cloud Logging&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post gives a tour of some of these new features via a &lt;a href=&quot;%20https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;Cloud AI Platform Pipelines&lt;/a&gt; example, that shows end-to-end management of an AutoML Tables workflow.&lt;/p&gt;

&lt;p&gt;The example pipeline &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/import#create&quot;&gt;creates a &lt;em&gt;dataset&lt;/em&gt;&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/import#import-data&quot;&gt;imports&lt;/a&gt; data into the dataset from a &lt;a href=&quot;https://cloud.google.com/bigquery&quot;&gt;BigQuery&lt;/a&gt; &lt;em&gt;view&lt;/em&gt;, and &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/train&quot;&gt;trains&lt;/a&gt; a custom model on that data. Then, it fetches &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/evaluate&quot;&gt;evaluation and metrics&lt;/a&gt; information about the trained model, and based on specified criteria about model quality, uses that information to automatically determine whether to &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/predict&quot;&gt;deploy&lt;/a&gt; the model for online prediction.   Once the model is deployed, you can make prediction requests, and optionally obtain prediction &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;explanations&lt;/a&gt; as well as the prediction result.
In addition, the example shows how to scalably &lt;strong&gt;&lt;em&gt;serve&lt;/em&gt;&lt;/strong&gt; your exported trained model from your Cloud AI Platform Pipelines installation for prediction requests.&lt;/p&gt;

&lt;p&gt;You can manage all the parts of this workflow from the &lt;a href=&quot;https://console.cloud.google.com/automl-tables&quot;&gt;Tables UI&lt;/a&gt; as well, or programmatically via a &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/automl/tables/xai/automl_tables_xai.ipynb&quot;&gt;notebook&lt;/a&gt; or script.  But specifying this process as a workflow has some advantages: the workflow becomes reliable and repeatable, and Pipelines makes it easy to monitor the results and schedule recurring runs.
For example, if your dataset is updated regularly—say once a day— you could schedule a workflow to run daily, each day building a model that trains on an updated dataset.
(With a bit more work, you could also set up event-based triggering pipeline runs, for example &lt;a href=&quot;http://amygdala.github.io/kubeflow/ml/2019/08/22/remote-deploy.html#using-cloud-function-triggers&quot;&gt;when new data is added&lt;/a&gt; to a &lt;a href=&quot;https://cloud.google.com/storage&quot;&gt;Google Cloud Storage&lt;/a&gt; bucket.)&lt;/p&gt;

&lt;h3 id=&quot;about-the-example-dataset-and-scenario&quot;&gt;About the example dataset and scenario&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://cloud.google.com/bigquery/public-data/&quot;&gt;Cloud Public Datasets Program&lt;/a&gt; makes available public datasets that are useful for experimenting with machine learning. For our examples, we’ll use data that is essentially a join of two public datasets stored in &lt;a href=&quot;https://cloud.google.com/bigquery/&quot;&gt;BigQuery&lt;/a&gt;: &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=london_bicycles&amp;amp;page=dataset&quot;&gt;London Bike rentals&lt;/a&gt; and &lt;a href=&quot;https://console.cloud.google.com/bigquery?p=bigquery-public-data&amp;amp;d=noaa_gsod&amp;amp;page=dataset&quot;&gt;NOAA weather data&lt;/a&gt;, with some additional processing to clean up outliers and derive additional GIS and day-of-week fields.  Using this dataset, we’ll build a regression model to predict the &lt;em&gt;duration&lt;/em&gt; of a bike rental based on information about the start and end rental stations, the day of the week, the weather on that day, and other data. If we were running a bike rental company, we could use these predictions—and their &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;explanations&lt;/a&gt;—to help us anticipate demand and even plan how to stock each location.
While we’re using bike and weather data here, you can use AutoML Tables for tasks as varied as asset valuations, fraud detection, credit risk analysis, customer retention prediction, analyzing item layouts in stores, and many more.&lt;/p&gt;

&lt;h2 id=&quot;using-cloud-ai-platform-pipelines-or-kubeflow-pipelines-to-orchestrate-a-tables-workflow&quot;&gt;Using Cloud AI Platform Pipelines or Kubeflow Pipelines to orchestrate a Tables workflow&lt;/h2&gt;

&lt;p&gt;You can run this example via a &lt;a href=&quot;https://cloud.google.com/ai-platform/pipelines/docs&quot;&gt;Cloud AI Platform Pipelines&lt;/a&gt; installation, or via &lt;a href=&quot;https://kubeflow.org/&quot;&gt;Kubeflow Pipelines&lt;/a&gt; on a &lt;a href=&quot;https://www.kubeflow.org/docs/gke/deploy/&quot;&gt;Kubeflow on GKE&lt;/a&gt; installation.  &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;Cloud AI Platform Pipelines&lt;/a&gt; was recently launched in Beta. Slightly different variants of the pipeline specification are required depending upon which you’re using. (It would be possible to run the example on other Kubeflow installations too, but that would require additional credentials setup not covered in this tutorial).&lt;/p&gt;

&lt;h3 id=&quot;install-a-cloud-ai-platform-pipelines-cluster&quot;&gt;Install a Cloud AI Platform Pipelines cluster&lt;/h3&gt;

&lt;p&gt;You can create an AI Platform Pipelines installation with a few clicks.  Access AI Platform Pipelines by visiting the &lt;a href=&quot;https://console.cloud.google.com/ai-platform/pipelines/clusters&quot;&gt;AI Platform Panel&lt;/a&gt; in the &lt;a href=&quot;https://console.cloud.google.com&quot;&gt;Cloud Console&lt;/a&gt;.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/sA17BykJuzF.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/sA17BykJuzF.png&quot; width=&quot;90%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Create a new Pipelines instance.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;See the &lt;a href=&quot;https://cloud.google.com/ai-platform/pipelines/docs&quot;&gt;documentation&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;p&gt;(You can also do this installation &lt;a href=&quot;https://github.com/kubeflow/pipelines/tree/master/manifests/gcp_marketplace&quot;&gt;from the command line&lt;/a&gt; onto an existing GKE cluster if you prefer. If you do, for consistency with the UI installation, create the GKE cluster with &lt;code class=&quot;highlighter-rouge&quot;&gt;--scopes cloud-platform&lt;/code&gt;).&lt;/p&gt;

&lt;h3 id=&quot;or-install-kubeflow-to-use-kubeflow-pipelines&quot;&gt;Or, install Kubeflow to use Kubeflow Pipelines&lt;/h3&gt;

&lt;p&gt;You can also run this example from a &lt;a href=&quot;https://www.kubeflow.org/&quot;&gt;Kubeflow&lt;/a&gt; installation. For the example to work out of the box, you’ll need a Kubeflow on &lt;a href=&quot;https://cloud.google.com/kubernetes-engine&quot;&gt;GKE&lt;/a&gt; installation, set up to use &lt;a href=&quot;https://cloud.google.com/iap&quot;&gt;IAP&lt;/a&gt;.  An easy way to do this is via the Kubeflow &lt;a href=&quot;https://deploy.kubeflow.cloud/#/deploy&quot;&gt;‘click to deploy’ web app&lt;/a&gt;, or you can follow the command-line instructions &lt;a href=&quot;https://www.kubeflow.org/docs/gke/deploy/deploy-cli/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;upload-and-run-the-tables-end-to-end-pipeline&quot;&gt;Upload and run the Tables end-to-end Pipeline&lt;/h3&gt;

&lt;p&gt;Once a Pipelines installation is running, we can upload the example AutoML Tables pipeline.
Click on &lt;strong&gt;Pipelines&lt;/strong&gt; in the left nav bar of the Pipelines Dashboard.  Click on &lt;strong&gt;Upload Pipeline&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;For Cloud AI Platform Pipelines, upload &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/tables_pipeline_caip.py.tar.gz?raw=true&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;tables_pipeline_caip.py.tar.gz&lt;/code&gt;&lt;/a&gt;.  This archive points to the compiled version of &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/tables_pipeline_caip.py&quot;&gt;this pipeline&lt;/a&gt;, specified and compiled using the &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/&quot;&gt;Kubeflow Pipelines SDK&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;For Kubeflow Pipelines on a Kubeflow installation, upload  &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/tables_pipeline_kf.py.tar.gz?raw=true&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;tables_pipeline_kf.py.tar.gz&lt;/code&gt;&lt;/a&gt;.  This archive points to the compiled version of &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/tables_pipeline_kf.py&quot;&gt;this pipeline&lt;/a&gt;.
&lt;strong&gt;To run this example on a KF installation, you will need to give the &lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;lt;deployment-name&amp;gt;-user@&amp;lt;project-id&amp;gt;.iam.gserviceaccount.com&lt;/code&gt; service account &lt;code class=&quot;highlighter-rouge&quot;&gt;AutoML Admin&lt;/code&gt; privileges&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: The difference between the two pipelines relates to how GCP authentication is handled.  For the Kubeflow pipeline, we’ve added &lt;code class=&quot;highlighter-rouge&quot;&gt;.apply(gcp.use_gcp_secret('user-gcp-sa'))&lt;/code&gt; annotations to the pipeline steps. This tells the pipeline to use the mounted &lt;em&gt;secret&lt;/em&gt;—set up during the installation process— that provides GCP account credentials.  With the Cloud AI Platform Pipelines installation, the GKE cluster nodes have been set up to use the &lt;code class=&quot;highlighter-rouge&quot;&gt;cloud-platform&lt;/code&gt; scope. With recent Kubeflow releases, use of the mounted secret is no longer necessary, but we include both versions for compatibility.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The uploaded pipeline graph will look similar to this:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-17%20at%204.27.41%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-17%20at%204.27.41%20PM.png&quot; width=&quot;40%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;The uploaded Tables 'end-to-end' pipeline.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Click the &lt;strong&gt;+Create Run&lt;/strong&gt; button to run the pipeline.  You will need to fill in some pipeline parameters.
Specifically, replace &lt;code class=&quot;highlighter-rouge&quot;&gt;YOUR_PROJECT_HERE&lt;/code&gt; with the name of your project; replace &lt;code class=&quot;highlighter-rouge&quot;&gt;YOUR_DATASET_NAME&lt;/code&gt; with the name you want to give your new dataset (make it unique, and use letters, numbers and underscores up to 32 characters); and replace &lt;code class=&quot;highlighter-rouge&quot;&gt;YOUR_BUCKET_NAME&lt;/code&gt; with the name of a &lt;a href=&quot;https://cloud.google.com/storage&quot;&gt;GCS bucket&lt;/a&gt;. Do not include the &lt;code class=&quot;highlighter-rouge&quot;&gt;gs://&lt;/code&gt; prefix— just enter the name. This bucket should be in the &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/locations#buckets&quot;&gt;same &lt;em&gt;region&lt;/em&gt;&lt;/a&gt; as that specified by the &lt;code class=&quot;highlighter-rouge&quot;&gt;gcp_region&lt;/code&gt; parameter. E.g., if you keep the default &lt;code class=&quot;highlighter-rouge&quot;&gt;us-central1&lt;/code&gt; region, your bucket should also be a &lt;em&gt;regional&lt;/em&gt; (not multi-regional) bucket in the &lt;code class=&quot;highlighter-rouge&quot;&gt;us-central1&lt;/code&gt; region. ++double check that this is necessary.++&lt;/p&gt;

&lt;p&gt;If you want to schedule a recurrent set of runs, you can do that instead.  If your data is in &lt;a href=&quot;https://cloud.google.com/bigquery&quot;&gt;BigQuery&lt;/a&gt;— as is the case for this example pipeline— and has a temporal aspect, you could define a &lt;em&gt;view&lt;/em&gt; to reflect that, e.g. to return data from a window over the last &lt;code class=&quot;highlighter-rouge&quot;&gt;N&lt;/code&gt; days or hours.  Then, the AutoML pipeline could specify ingestion of data from that view, grabbing an updated data window each time the pipeline is run, and building a new model based on that updated window.&lt;/p&gt;

&lt;h2 id=&quot;the-steps-executed-by-the-pipeline&quot;&gt;The steps executed by the pipeline&lt;/h2&gt;

&lt;p&gt;The example pipeline &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/import#create&quot;&gt;creates a &lt;em&gt;dataset&lt;/em&gt;&lt;/a&gt;, &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/import#import-data&quot;&gt;imports&lt;/a&gt; data into the dataset from a &lt;a href=&quot;https://cloud.google.com/bigquery&quot;&gt;BigQuery&lt;/a&gt; &lt;em&gt;view&lt;/em&gt;, and &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/train&quot;&gt;trains&lt;/a&gt; a custom model on that data. Then, it fetches &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/evaluate&quot;&gt;evaluation and metrics&lt;/a&gt; information about the trained model, and based on specified criteria about model quality, uses that information to automatically determine whether to &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/predict&quot;&gt;deploy&lt;/a&gt; the model for online prediction. We’ll take a closer look at each of the pipeline steps, and how they’re implemented.&lt;/p&gt;

&lt;h3 id=&quot;create-a-tables-dataset-and-adjust-its-schema&quot;&gt;Create a Tables dataset and adjust its schema&lt;/h3&gt;

&lt;p&gt;This pipeline creates a new Tables &lt;em&gt;dataset&lt;/em&gt;, and ingests data from a &lt;a href=&quot;https://cloud.google.com/bigquery&quot;&gt;BigQuery&lt;/a&gt; table for the “bikes and weather” dataset described above. These actions are implemented by the first two steps in the pipeline  (the &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-create-dataset-for-tables&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-import-data-for-tables&lt;/code&gt; steps).&lt;/p&gt;

&lt;p&gt;While we’re not showing it in this example, AutoML Tables supports ingestion from BigQuery &lt;em&gt;views&lt;/em&gt; as well as tables.  This can be an easy way to do &lt;strong&gt;&lt;em&gt;feature engineering&lt;/em&gt;&lt;/strong&gt;: leverage BigQuery’s rich set of functions and operators to clean and transform your data before you ingest it.&lt;/p&gt;

&lt;p&gt;When the data is ingested, AutoML Tables infers the &lt;em&gt;data type&lt;/em&gt; for each field (column).  In some cases, those inferred types may not be what you want.  For example, for our “bikes and weather” dataset, several ID fields (like the rental station IDs) are set by default to be numeric, but we want them treated as categorical when we train our model.  In addition, we want to treat the &lt;code class=&quot;highlighter-rouge&quot;&gt;loc_cross&lt;/code&gt; strings as categorical rather than text.&lt;/p&gt;

&lt;p&gt;We make these adjustments programmatically, by defining a pipeline parameter that specifies the schema changes we want to make. Then, in the &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-set-dataset-schema&lt;/code&gt; pipeline step, for each indicated schema adjustment, we call &lt;code class=&quot;highlighter-rouge&quot;&gt;update_column_spec&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;type_code&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;nullable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nullable&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Before we can train the model, we also need to specify the &lt;em&gt;target&lt;/em&gt; column— what we want our model to predict.  In this case, we’ll train the model to predict rental &lt;em&gt;duration&lt;/em&gt;.  This is a numeric value, so we’ll be training a &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/problem-types#regression_problems&quot;&gt;regression&lt;/a&gt; model.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_target_column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;target_column_spec_name&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;train-a-custom-model-on-the-dataset&quot;&gt;Train a custom model on the dataset&lt;/h3&gt;

&lt;p&gt;Once the dataset is defined and its schema set properly, the pipeline will train the model.  This happens in the &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-create-model-for-tables&lt;/code&gt; pipeline step. Via pipeline parameters, we can specify the training budget, the &lt;em&gt;optimization objective&lt;/em&gt; (if not using the default), and can additionally specify which columns to include or exclude from the model inputs.&lt;/p&gt;

&lt;p&gt;You may want to specify a non-default optimization objective depending upon the characteristics of your dataset.  &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/train#opt-obj&quot;&gt;This table&lt;/a&gt; describes the available optimization objectives and when you might want to use them. For example, if you were training a classification model using an imbalanced dataset, you might want to specify use of AUC PR (&lt;code class=&quot;highlighter-rouge&quot;&gt;MAXIMIZE_AU_PRC&lt;/code&gt;), which optimizes results for predictions for the less common class.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;train_budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;optimization_objective&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optimization_objective&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;include_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;include_column_spec_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;exclude_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exclude_column_spec_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;view-model-search-information-via-cloud-logging&quot;&gt;View model search information via Cloud Logging&lt;/h3&gt;

&lt;p&gt;You can view details about an AutoML Tables model &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/logging&quot;&gt;via Cloud Logging&lt;/a&gt;. Using Logging, you can see the final model hyperparameters as well as the hyperparameters and object values used during model training and tuning.&lt;/p&gt;

&lt;p&gt;An easy way to access these logs is to go to the &lt;a href=&quot;https://console.cloud.google.com/automl-tables&quot;&gt;AutoML Tables page&lt;/a&gt; in the Cloud Console.  Select the Models tab in the left navigation pane and click on the model you’re interested in.  Click the “Model” link to see the final hyperparameter logs.  To see the tuning trial hyperparameters, click the “Trials” link.&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.20.46%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.20.46%20PM.png&quot; width=&quot;30%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;View a model's search logs from its evaluation information.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;For example, here is a look at the Trials logs a custom model trained on the “bikes and weather” dataset, with one of the entries expanded in the logs:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.23.00%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.23.00%20PM.png&quot; width=&quot;90%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;The 'Trials' logs for a &quot;bikes and weather&quot; model&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;custom-model-evaluation&quot;&gt;Custom model evaluation&lt;/h3&gt;

&lt;p&gt;Once your custom model has finished training, the pipeline moves on to its next step: model evaluation. We can access evaluation metrics via the API.  We’ll use this information to decide whether or not to deploy the model.&lt;/p&gt;

&lt;p&gt;These actions are factored into two steps. The process of fetching the evaluation information can be a general-purpose component (pipeline step) used in many situations; and then we’ll follow that with a more special-purpose step, that analyzes that information and uses it to decide whether or not to deploy the trained model.&lt;/p&gt;

&lt;p&gt;In the first of these pipeline steps— the &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-eval-tables-model&lt;/code&gt; step— we’ll retrieve the evaluation and &lt;em&gt;global feature importance&lt;/em&gt; information.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;column&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_model_metadata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_model_column_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;list_model_evaluations&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;AutoML Tables automatically computes global feature importance for a trained model. This shows, across the evaluation set, the average absolute attribution each feature receives. Higher values mean the feature generally has greater influence on the model’s predictions.
This information is useful for debugging and improving your model. If a feature’s contribution is negligible—if it has a low value—you can simplify the model by excluding it from future training.
The pipeline step renders the global feature importance data as part of the pipeline run’s output:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%201.22.42%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%201.22.42%20PM.png&quot; width=&quot;50%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Global feature importance for the model inputs, rendered by a Kubeflow Pipeline step.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;For our example, based on the graphic above, we might try training a model without including &lt;code class=&quot;highlighter-rouge&quot;&gt;bike_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the following pipeline step— the &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-eval-metrics&lt;/code&gt; step— the evaluation output from the previous step is grabbed as input, and parsed to extract metrics that we’ll use in conjunction with pipeline parameters to decide whether or not to deploy the model. Note that this component is more special-purpose: unlike the other components in this pipeline, which support generalizable operations, this component— while it is parameterized— is specific in how it analyzes the evaluation info and decides whether or not to do the deployment.&lt;/p&gt;

&lt;p&gt;One of the pipeline input parameters allows specification of metric thresholds. In this example, we’re training a regression model, and we’re specifying a &lt;code class=&quot;highlighter-rouge&quot;&gt;mean_absolute_error&lt;/code&gt; (MAE) value as a threshold in the pipeline input parameters:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;mean_absolute_error&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;450&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The  &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-eval-metrics&lt;/code&gt; pipeline step compares the model evaluation information to the given threshold constraints. In this case, if the MAE is &amp;lt; &lt;code class=&quot;highlighter-rouge&quot;&gt;450&lt;/code&gt;, the model will not be deployed. The pipeline step outputs that decision, and displays the evaluation information it’s using as part of the pipeline run’s output:&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.07.21%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-23%20at%202.07.21%20PM.png&quot; width=&quot;25%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Information about a model's evaluation, rendered by a Kubeflow Pipeline step.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;conditional-model-deployment&quot;&gt;(Conditional) model deployment&lt;/h3&gt;

&lt;p&gt;You can &lt;em&gt;deploy&lt;/em&gt; any of your custom Tables models to make them accessible for online prediction requests.
The pipeline code uses a &lt;em&gt;conditional test&lt;/em&gt; to determine whether or not to run the step that deploys the model, based on the output of the evaluation step described above:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Condition&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eval_metrics&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'deploy'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;deploy_model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;deploy_model_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Only if the model meets the given criteria, will the deployment step (called &lt;code class=&quot;highlighter-rouge&quot;&gt;automl-deploy-tables-model&lt;/code&gt;) be run, and the model be deployed automatically as part of the pipeline run:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deploy_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can always deploy a model later if you like.&lt;/p&gt;

&lt;h3 id=&quot;putting-it-together-the-full-pipeline-execution&quot;&gt;Putting it together: The full pipeline execution&lt;/h3&gt;

&lt;p&gt;The figure below shows the result of a pipeline run.  In this case, the conditional step was executed— based on the model evaluation metrics— and the trained model was deployed.
Via the UI, you can view outputs, logs for each step, run artifacts and lineage information, and more.  See &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines&quot;&gt;this post&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;p&gt;++TODO: replace the following figure with something better++&lt;/p&gt;

&lt;figure&gt;
&lt;a href=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-17%20at%204.28.32%20PM.png&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;https://storage.googleapis.com/amy-jo/images/automl/tables_e2e/Screen%20Shot%202020-03-17%20at%204.28.32%20PM.png&quot; width=&quot;40%&quot; /&gt;&lt;/a&gt;
&lt;figcaption&gt;&lt;br /&gt;&lt;i&gt;Execution of a pipeline run. You can view outputs, logs for each step, run artifacts and lineage information, and more.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;getting-explanations-about-your-models-predictions&quot;&gt;Getting explanations about your model’s predictions&lt;/h2&gt;

&lt;p&gt;Once a model is deployed, you can request predictions from that model.  You can additionally request &lt;em&gt;explanations for local feature importance&lt;/em&gt;: a score showing how much (and in which direction) each feature influenced the prediction for a single example.  See &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;this blog post&lt;/a&gt; for more information on how those values are calculated.&lt;/p&gt;

&lt;p&gt;Here is a &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/master/ml/automl/tables/xai/automl_tables_xai.ipynb&quot;&gt;notebook example&lt;/a&gt; of how to request a prediction and its explanation using the Python client libraries.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl_v1beta1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TablesClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PROJECT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;REGION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The prediction response will have a structure like &lt;a href=&quot;https://gist.github.com/amygdala/c96d45bdf694737d77d91597ca3ef1f0&quot;&gt;this&lt;/a&gt;. (The notebook above shows how to visualize the local feature importance results using &lt;code class=&quot;highlighter-rouge&quot;&gt;matplotlib&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;It’s easy to explore local feature importance through the Cloud Console’s &lt;a href=&quot;https://console.cloud.google.com/automl-tables&quot;&gt;AutoML Tables UI &lt;/a&gt;as well. After you deploy a model, go to the &lt;strong&gt;TEST &amp;amp; USE&lt;/strong&gt; tab of the Tables panel, select &lt;strong&gt;ONLINE PREDICTION&lt;/strong&gt;, enter the field values for the prediction, and then check the &lt;strong&gt;Generate feature importance&lt;/strong&gt; box at the bottom of the page. The result will show the feature importance values as well as the prediction.  This &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;blog post&lt;/a&gt; gives some examples of how these explanations can be used to find potential issues with your data or help you better understand your problem domain.&lt;/p&gt;

&lt;h2 id=&quot;the-automl-tables-ui-in-the-cloud-console&quot;&gt;The AutoML Tables UI in the Cloud Console&lt;/h2&gt;

&lt;p&gt;With this example we’ve focused on how you can automate a Tables workflow using Kubeflow pipelines and the Python client libraries.&lt;/p&gt;

&lt;p&gt;All of the pipeline steps can also be accomplished via the &lt;a href=&quot;https://console.cloud.google.com/automl-tables&quot;&gt;AutoML Tables UI&lt;/a&gt; in the Cloud Console, including many useful visualizations, and other functionality not implemented by this example pipeline— such as the ability to export the model’s test set and prediction results to BigQuery for further analysis.&lt;/p&gt;

&lt;h2 id=&quot;export-the-trained-model-and-serve-it-on-a-gke-cluster&quot;&gt;Export the trained model and serve it on a GKE cluster&lt;/h2&gt;

&lt;p&gt;Recently, Tables launched a feature to let you export your full custom model, packaged so that you can serve it via a Docker container. (Under the hood, it is using TensorFlow Serving). This lets you serve your models anywhere that you can run a container, including a GKE cluster.
This means that you can run a model serving service on your AI Platform Pipelines or Kubeflow installation, both of which run on GKE.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://amygdala.github.io/automl/ml/2019/12/05/automl_tables_export.html&quot;&gt;This blog post&lt;/a&gt; walks through the steps to serve the exported model (in this case, using &lt;a href=&quot;https://cloud.google.com/run&quot;&gt;Cloud Run&lt;/a&gt;).  Follow the instructions in the post through the “View information about your exported model in TensorBoard” &lt;a href=&quot;http://amygdala.github.io/automl/ml/2019/12/05/automl_tables_export.html#view-information-about-your-exported-model-in-tensorboard&quot;&gt;section&lt;/a&gt;.
Here, we’ll diverge from the rest of the post and create a GKE service instead.&lt;/p&gt;

&lt;p&gt;Make a copy of &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/deploy_model_for_tables/model_serve_template.yaml&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;deploy_model_for_tables/model_serve_template.yaml&lt;/code&gt;&lt;/a&gt; file and name it &lt;code class=&quot;highlighter-rouge&quot;&gt;model_serve.yaml&lt;/code&gt;.  Edit this new file, &lt;strong&gt;replacing&lt;/strong&gt; &lt;code class=&quot;highlighter-rouge&quot;&gt;MODEL_NAME&lt;/code&gt; with some meaningful name for your model, &lt;code class=&quot;highlighter-rouge&quot;&gt;IMAGE_NAME&lt;/code&gt; with the name of the container image you built (as described in the &lt;a href=&quot;http://amygdala.github.io/automl/ml/2019/12/05/automl_tables_export.html&quot;&gt;blog post&lt;/a&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;NAMESPACE&lt;/code&gt; with the namespace in which you want to run your service (e.g. &lt;code class=&quot;highlighter-rouge&quot;&gt;default&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Then, from the command line, run:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; model_serve.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;to set up your model serving &lt;em&gt;service&lt;/em&gt; and its underlying &lt;em&gt;deployment&lt;/em&gt;. (Before you do that, make sure that kubectl is set to use your GKE cluster’s credentials.  One way to do that is to visit the &lt;a href=&quot;https://console.cloud.google.com/kubernetes/list&quot;&gt;GKE panel in the Cloud Console&lt;/a&gt;, and click &lt;strong&gt;Connect&lt;/strong&gt; for that cluster.)&lt;/p&gt;

&lt;p&gt;You can later take down the service and its deployment by running:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; model_serve.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;send-prediction-requests-to-your-deployed-model-service&quot;&gt;Send prediction requests to your deployed model service&lt;/h3&gt;

&lt;p&gt;Once your model serving service is deployed, you can send prediction requests to it.  Because we didn’t set up an external endpoint for our service in this simple example, we’ll connect to the service via port forwarding.
From the command line, run the following, &lt;strong&gt;replacing&lt;/strong&gt; &lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;lt;your-model-name&amp;gt;&lt;/code&gt; with the value you replaced &lt;code class=&quot;highlighter-rouge&quot;&gt;MODEL_NAME&lt;/code&gt; by, when creating your &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; file, and &lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;lt;service-namespace&amp;gt;&lt;/code&gt; with the namespace in which your service is running— the same namespace value you used in the yaml file.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt; &amp;lt;service-namespace&amp;gt; port-forward svc/&amp;lt;your-model-name&amp;gt;  8080:80
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, from the &lt;code class=&quot;highlighter-rouge&quot;&gt;deploy_model_for_tables&lt;/code&gt; directory, send a prediction request to your service like this:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl &lt;span class=&quot;nt&quot;&gt;-X&lt;/span&gt; POST &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; @./instances.json http://localhost:8080/predict
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see a result like this, with a prediction for each instance in the &lt;code class=&quot;highlighter-rouge&quot;&gt;instances.json&lt;/code&gt; file:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;predictions&quot;&lt;/span&gt;: &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;860.79833984375, 460.5323486328125, 1211.7664794921875]&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(If you get an error, make sure you’re in the correct directory and see the &lt;code class=&quot;highlighter-rouge&quot;&gt;instances.json&lt;/code&gt; file listed).&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: it would be possible to add this deployment step to the pipeline too.  (See &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/deploy_model_for_tables/exported_model_deploy.py&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;deploy_model_for_tables/exported_model_deploy.py&lt;/code&gt;&lt;/a&gt;).  However, the &lt;a href=&quot;https://googleapis.dev/python/automl/latest/gapic/v1beta1/tables.html&quot;&gt;Python client library&lt;/a&gt; does not yet support the ‘export’ operation.  Once deployment is supported by the client library, this would be a natural addition to the workflow. While not tested, it should also be possible to do the export programmatically via the &lt;a href=&quot;https://cloud.google.com/automl/docs/reference/rest/v1/projects.locations.models/export&quot;&gt;REST API&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;a-deeper-dive-into-the-pipeline-code&quot;&gt;A deeper dive into the pipeline code&lt;/h2&gt;

&lt;p&gt;The updated &lt;a href=&quot;https://googleapis.dev/python/automl/latest/gapic/v1beta1/tables.html&quot;&gt;Tables Python client library&lt;/a&gt; makes it very straightforward to build the Pipelines components that support each stage of the workflow.
Kubeflow Pipeline steps are container-based, so that any action you can support via a Docker container image can become a pipeline step.
That doesn’t mean that an end-user necessarily needs to have Docker installed.  For many straightforward cases, building your pipeline steps&lt;/p&gt;

&lt;h3 id=&quot;using-the-lightweight-python-components--functionality-to-build-pipeline-steps&quot;&gt;Using the ‘lightweight python components’  functionality to build pipeline steps&lt;/h3&gt;

&lt;p&gt;For most of the components in this example, we’re building them using the &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/lightweight-python-components/&quot;&gt;“lightweight python components”&lt;/a&gt; functionality as shown in &lt;a href=&quot;https://github.com/kubeflow/pipelines/blob/master/samples/tutorials/mnist/01_Lightweight_Python_Components.ipynb&quot;&gt;this example notebook&lt;/a&gt;, including compilation of the code into a component package.  This feature allows you to create components based on Python functions, building on an appropriate base image, so that you do not need to have docker installed or rebuild a container image each time your code changes.&lt;/p&gt;

&lt;p&gt;Each component’s python file includes a function definition, and then a &lt;code class=&quot;highlighter-rouge&quot;&gt;func_to_container_op&lt;/code&gt; call, passing the function definition, to generate the component’s &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; package file.  As we’ll see below, these component package files make it very straightforward to put these steps together to form a pipeline.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/deploy_model_for_tables/tables_deploy_component.py&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;deploy_model_for_tables/tables_deploy_component.py&lt;/code&gt;&lt;/a&gt; file is representative.   It contains an &lt;code class=&quot;highlighter-rouge&quot;&gt;automl_deploy_tables_model&lt;/code&gt; function definition.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;def automl_deploy_tables_model(
  gcp_project_id: str,
  gcp_region: str,
  model_display_name: str,
  api_endpoint: str = None,
  ) -&amp;gt; NamedTuple('Outputs', [('model_display_name', str), ('status', str)]):
...
  return (model_display_name, status)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The function defines the component’s inputs and outputs, and this information will be used to support static checking when we compose these components to build the pipeline.&lt;/p&gt;

&lt;p&gt;To build the component &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; file corresponding to this function, we add the following to the components’ Python script, then can run &lt;code class=&quot;highlighter-rouge&quot;&gt;python &amp;lt;filename&amp;gt;.py&lt;/code&gt; from the command line to generate it (you must have the Kubeflow Pipelines (KFP) sdk &lt;a href=&quot;https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/&quot;&gt;installed&lt;/a&gt;).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;__name__&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'__main__'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;kfp&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;kfp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;components&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_to_container_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;automl_deploy_tables_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output_component_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'tables_deploy_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;base_image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'python:3.7'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Whenever you change the python function definition, just recompile to regenerate the corresponding component file.&lt;/p&gt;

&lt;h3 id=&quot;specifying-the-tables-pipeline&quot;&gt;Specifying the Tables pipeline&lt;/h3&gt;

&lt;p&gt;With the components packaged into &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; files, it becomes very straightforward to specify a pipeline, such as &lt;a href=&quot;https://github.com/amygdala/code-snippets/blob/tables_e2e/ml/automl/tables/kfp_e2e/tables_pipeline_caip.py&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;tables_pipeline_caip.py&lt;/code&gt;&lt;/a&gt;, that uses them.  Here, we’re just using the &lt;code class=&quot;highlighter-rouge&quot;&gt;load_component_from_file()&lt;/code&gt; method, since the &lt;code class=&quot;highlighter-rouge&quot;&gt;yaml&lt;/code&gt; files are all local (in the same repo).  However, there is also a &lt;code class=&quot;highlighter-rouge&quot;&gt;load_component_from_url()&lt;/code&gt; method, which makes it easy to share components. (If your URL points to a file in GitHub, be sure to use raw mode).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;create_dataset_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'./create_dataset_for_tables/tables_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;import_data_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'./import_data_from_bigquery/tables_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;set_schema_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;'./import_data_from_bigquery/tables_schema_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;train_model_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'./create_model_for_tables/tables_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eval_model_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'./create_model_for_tables/tables_eval_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eval_metrics_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'./create_model_for_tables/tables_eval_metrics_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;deploy_model_op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_component_from_file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'./deploy_model_for_tables/tables_deploy_component.yaml'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once all our pipeline ops (steps) are defined using the component definitions, then we can specify the pipeline by calling the constructors, e.g.:&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;n&quot;&gt;create_dataset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_dataset_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_project_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gcp_project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If a pipeline component has been defined to have outputs, other components can access those outputs.  E.g., here, the ‘eval’ step is grabbing an output from the ‘train’ step, specifically information about the model display name:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;n&quot;&gt;eval_model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval_model_op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_project_id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gcp_project_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gcp_region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bucket_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bucket_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gcs_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'automl_evals/{}'&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dsl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUN_ID_PLACEHOLDER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;api_endpoint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'model_display_name'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this manner it is straightforward to put together a pipeline from your component definitions.  Just don’t forget to recompile the pipeline script (to generate its corresponding &lt;code class=&quot;highlighter-rouge&quot;&gt;.tar.gz&lt;/code&gt; archive) if any of its component definitions changed, e.g. &lt;code class=&quot;highlighter-rouge&quot;&gt;python tables_pipeline_caip.py&lt;/code&gt;.&lt;/p&gt;</content><author><name></name></author><summary type="html">Introduction</summary></entry><entry><title type="html">Getting explanations for AutoML Tables predictions</title><link href="https://amygdala.github.io/gcp_blog/ml/xai/automl/jupyter/2020/04/17/automl_tables_xai.html" rel="alternate" type="text/html" title="Getting explanations for AutoML Tables predictions" /><published>2020-04-17T00:00:00-05:00</published><updated>2020-04-17T00:00:00-05:00</updated><id>https://amygdala.github.io/gcp_blog/ml/xai/automl/jupyter/2020/04/17/automl_tables_xai</id><content type="html" xml:base="https://amygdala.github.io/gcp_blog/ml/xai/automl/jupyter/2020/04/17/automl_tables_xai.html">&lt;!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: _notebooks/2020-04-17-automl_tables_xai.ipynb
--&gt;

&lt;div class=&quot;container&quot; id=&quot;notebook-container&quot;&gt;
        
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Introduction&quot;&gt;Introduction&lt;a class=&quot;anchor-link&quot; href=&quot;#Introduction&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Google Cloud’s &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/&quot;&gt;AutoML Tables&lt;/a&gt; lets you automatically build and deploy state-of-the-art machine learning models using your own structured data.&lt;/p&gt;
&lt;p&gt;AutoML Tables now has an easier-to-use &lt;a href=&quot;https://googleapis.dev/python/automl/latest/gapic/v1beta1/tables.html&quot;&gt;Tables-specific Python client library&lt;/a&gt;, 
as well as a new ability to &lt;strong&gt;explain&lt;/strong&gt; online prediction results— called &lt;em&gt;local feature importance&lt;/em&gt;—  which gives visibility into how the features in a specific prediction request informed the resulting prediction.  You can read more about explainable AI for Tables in &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data&quot;&gt;this blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The source for this post is a Jupyter notebook. 
 In this notebook, we'll create a custom Tables model to predict duration of London bike rentals given information about local weather as well as info about the rental trip.
We'll walk through examples of using the Tables client libraries for creating a dataset, training a custom model, deploying the model, and using it to make predictions; and show how you can programmatically request local feature importance information.&lt;/p&gt;
&lt;p&gt;We recommend running this notebook using &lt;a href=&quot;https://cloud.google.com/ai-platform-notebooks/&quot;&gt;AI Platform Notebooks&lt;/a&gt;.
If you want to run the notebook on &lt;a href=&quot;https://colab.research.google.com/&quot;&gt;colab&lt;/a&gt; (or locally), it's possible, but you'll need to do a bit more setup.  See the Appendix section of this notebook for details.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Before-you-begin&quot;&gt;Before you begin&lt;a class=&quot;anchor-link&quot; href=&quot;#Before-you-begin&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Follow the &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/&quot;&gt;AutoML Tables documentation&lt;/a&gt; to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://console.cloud.google.com/cloud-resource-manager&quot;&gt;Select or create a GCP project&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://cloud.google.com/billing/docs/how-to/modify-project&quot;&gt;Make sure that billing is enabled&lt;/a&gt; for your project&lt;/li&gt;
&lt;li&gt;Enable the &lt;a href=&quot;https://console.cloud.google.com/flows/enableapi?apiid=storage-component.googleapis.com,automl.googleapis.com,storage-api.googleapis.com&quot;&gt;Cloud AutoML and Storage APIs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;(Recommended) Create an &lt;a href=&quot;https://cloud.google.com/ai-platform-notebooks/&quot;&gt;AI Platform Notebook&lt;/a&gt; instance and upload this notebook to it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(See also the &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/quickstart&quot;&gt;Quickstart guide&lt;/a&gt; for a getting-started walkthrough on AutoML Tables).&lt;/p&gt;
&lt;p&gt;Then, install the AutoML Python client libraries into your notebook environment:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;pip3 install -U google-cloud-automl
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;You may need to &lt;strong&gt;restart your notebook kernel&lt;/strong&gt; after running the above to pick up the installation.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Enter your GCP project ID in the cell below, then run the cell.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PROJECT_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;&amp;lt;your-project-id&amp;gt;&amp;quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Do-some-imports&quot;&gt;Do some imports&lt;a class=&quot;anchor-link&quot; href=&quot;#Do-some-imports&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Next, import some libraries and set some variables.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;argparse&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;os&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.api_core.client_options&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ClientOptions&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl_v1beta1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud.automl_v1beta1.proto.data_types_pb2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;data_types&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;REGION&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;us-central1&amp;#39;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;bikes-weather&amp;#39;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BIGQUERY_PROJECT_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;aju-dev-demos&amp;#39;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DATASET_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;london_bikes_weather&amp;#39;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TABLE_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;bikes_weather&amp;#39;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;IMPORT_URI&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;bq://&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGQUERY_PROJECT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TABLE_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IMPORT_URI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;bikes_weather&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Create-a-dataset,-and-import-data&quot;&gt;Create a dataset, and import data&lt;a class=&quot;anchor-link&quot; href=&quot;#Create-a-dataset,-and-import-data&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Next, we'll define some utility functions to create a dataset, and to import data into a dataset.  The &lt;code&gt;client.import_data()&lt;/code&gt; call returns an operation &lt;em&gt;future&lt;/em&gt; that can be used to check for completion synchronously or asynchronously— in this case we wait synchronously.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Create a dataset.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Create a dataset with the given display name&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Display the dataset information.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset id: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;/&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset display name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset metadata:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_dataset_metadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset example count: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example_count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Dataset create time:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;seconds: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;nanos: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nanos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataset&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;import_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Import structured data.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
 
    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startswith&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;bq&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;import_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bigquery_input_uri&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# Get the multiple Google Cloud Storage URIs.&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;input_uris&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;import_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;gcs_input_uris&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_uris&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Processing import...&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# synchronous check of operation status.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Data imported. &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; 
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Next, we'll create the &lt;code&gt;client&lt;/code&gt; object that we'll use for all our operations.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;automl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TablesClient&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PROJECT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;REGION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Create the Tables &lt;em&gt;dataset&lt;/em&gt;:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;... and then import data from the BigQuery table into the dataset. The import command will take a while to run. &lt;strong&gt;Wait until it has returned&lt;/strong&gt; before proceeding.  You can also check import status in the &lt;a href=&quot;https://console.cloud.google.com/automl-tables/datasets&quot;&gt;Cloud Console&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(Note that if you run this notebook multiple times, you will get an error if you try to create multiple datasets with the same name. However, you can train multiple models against the same dataset.)&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;import_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IMPORT_URI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Update-the-dataset-schema&quot;&gt;Update the dataset schema&lt;a class=&quot;anchor-link&quot; href=&quot;#Update-the-dataset-schema&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Now we'll define utility functions to update dataset and column information.  We need these to set the dataset's &lt;em&gt;target column&lt;/em&gt; (the field we'll train our model to predict) and to change the &lt;em&gt;types&lt;/em&gt; of some of the columns. AutoML Tables is pretty good at inferring reasonable column types based on input, but in our case, there are some columns (like bike station IDs) that we want to treat as &lt;em&gt;Categorical&lt;/em&gt; instead of &lt;em&gt;Numeric&lt;/em&gt;.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;type_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;nullable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Update column spec.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;type_code&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nullable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nullable&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# synchronous check of operation status.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Table spec updated. &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;update_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;target_column_spec_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;time_column_spec_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;test_train_column_spec_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Update dataset.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_column_spec_name&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_target_column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;target_column_spec_name&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Target column updated. &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time_column_spec_name&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_time_column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;column_spec_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time_column_spec_name&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Time column updated. &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;    
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;list_column_specs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                      &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                      &lt;span class=&quot;n&quot;&gt;filter_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;List all column specs.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# List all the table specs in the dataset by applying filter.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;list_column_specs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filter_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;List of column specs:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# Display the column_spec information.&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Column spec name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Column spec id: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;/&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Column spec display name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Column spec data type: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Update the dataset to indicate that the target column is &lt;code&gt;duration&lt;/code&gt;.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;update_dataset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;target_column_spec_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;duration&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#                 time_column_spec_name=&amp;#39;ts&amp;#39;&lt;/span&gt;
              &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Now we'll update some of the column types.  You can list their default specs first if you like:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;list_column_specs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;... and now we'll update them to the types we want:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;end_station_id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;s1&quot;&gt;&amp;#39;CATEGORY&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;start_station_id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;s1&quot;&gt;&amp;#39;CATEGORY&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;loc_cross&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;CATEGORY&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;update_column_spec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;bike_id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;s1&quot;&gt;&amp;#39;CATEGORY&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;You can view the results in the &lt;a href=&quot;https://console.cloud.google.com/automl-tables/datasets&quot;&gt;Cloud Console&lt;/a&gt;. Note that useful stats are generated for each column. You can also run the &lt;code&gt;list_column_specs()&lt;/code&gt; function again to see the new config.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;# list_column_specs(client, DATASET_NAME)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Train-a-custom-model-on-the-dataset&quot;&gt;Train a custom model on the dataset&lt;a class=&quot;anchor-link&quot; href=&quot;#Train-a-custom-model-on-the-dataset&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Now we're ready to train a model on the dataset. We'll need to generate a unique name for the model, which we'll do by appending a timestamp, in case you want to run this notebook multiple times. The &lt;code&gt;1000&lt;/code&gt; arg in the &lt;code&gt;create_model()&lt;/code&gt; call specifies to budget 1 hour of training time.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;create_model()&lt;/code&gt; utility function below, we may not want to block on the result, since total job time can be multiple hours. If you want the function to block until training is complete, uncomment the last line of the function below.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;bwmodel_&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;MODEL_NAME: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;train_budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;include_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;exclude_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Create a model.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
 
    &lt;span class=&quot;c1&quot;&gt;# Create a model with the model metadata in the region.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;train_budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;train_budget_milli_node_hours&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataset_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;include_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;include_column_spec_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;exclude_column_spec_names&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exclude_column_spec_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Training model...&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Training operation: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;operation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Training operation name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;operation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# uncomment the following to block until training is finished.&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# print(&amp;quot;Training completed: {}&amp;quot;.format(response.result()))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DATASET_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Get-the-status-of-your-training-job&quot;&gt;Get the status of your training job&lt;a class=&quot;anchor-link&quot; href=&quot;#Get-the-status-of-your-training-job&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Edit the following call to &lt;strong&gt;set &lt;code&gt;OP_NAME&lt;/code&gt; to the &quot;training operation name&quot;&lt;/strong&gt; listed in the output of &lt;code&gt;create_model()&lt;/code&gt; above.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OP_NAME&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;YOUR TRAINING OPERATION NAME&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_operation_status&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;operation_full_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Get operation status.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
 
    &lt;span class=&quot;c1&quot;&gt;# Get the latest state of a long-running operation.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;auto_ml_client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transport&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_operations_client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_operation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;operation_full_id&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Operation status: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud.automl&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;types&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OperationMetadata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ParseFromString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;The training job may take several hours. You can check on its status in the Cloud Console UI. You can also monitor it via the &lt;code&gt;get_operation_status()&lt;/code&gt; call below. (Make sure you've edited the OP_NAME variable value above). You'll see: &lt;code&gt;done: true&lt;/code&gt; in the output when it's finished.&lt;/p&gt;
&lt;p&gt;(Note: if you should lose your notebook kernel context while the training job is running, you can continue the rest of the notebook later with a new kernel: just make note of the &lt;code&gt;MODEL_NAME&lt;/code&gt;. You can find that information in the Cloud Console as well).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_operation_status&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OP_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Get-information-about-your-trained-custom-model&quot;&gt;Get information about your trained custom model&lt;a class=&quot;anchor-link&quot; href=&quot;#Get-information-about-your-trained-custom-model&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Once it has been created, you can get information about a specific model. (While the training job is still running, you'll just get a &lt;code&gt;not found&lt;/code&gt; message.)&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.cloud.automl_v1beta1&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;enums&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;google.api_core&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exceptions&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Get model details.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;except&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exceptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NotFound&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%s&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt; not found.&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Get complete detail of the model.a&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Retrieve deployment state.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deployment_state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;enums&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DeploymentState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEPLOYED&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;deployment_state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;deployed&amp;quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;deployment_state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;undeployed&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# get features of top global importance&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;column&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_model_metadata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_model_column_info&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reverse&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;feat_to_show&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;feat_to_show&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# Display the model information.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model id: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;/&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model display name: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Features of top importance:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feat&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feat_to_show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]:&lt;/span&gt;
        &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model create time:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;seconds: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;nanos: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nanos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model deployment state: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deployment_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feat_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;&lt;strong&gt;Don't proceed with the rest of the notebook until the model has finished training&lt;/strong&gt; and the following &lt;code&gt;get_model()&lt;/code&gt; call returns model information rather than '&lt;code&gt;not found&lt;/code&gt;'.&lt;/p&gt;
&lt;p&gt;Once the training job has finished, we can get information about the model, including information about which input features proved to be the most &lt;strong&gt;important globally&lt;/strong&gt; (that is, across the full training dataset).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;global_feat_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;We can graph the global feature importance values to get a visualization of which inputs were most important in training the model. (The Cloud Console UI also displays such a graph).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;global_feat_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;plt&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;global_feat_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;barh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yticks&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;    
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;See-your-model's-evaluation-metrics&quot;&gt;See your model's evaluation metrics&lt;a class=&quot;anchor-link&quot; href=&quot;#See-your-model's-evaluation-metrics&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;We can also get model evaluation information once the model is trained.  The available metrics depend upon which optimization objective you used.  In this example, we used the default, &lt;strong&gt;RMSE&lt;/strong&gt;.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;list_model_evaluations&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regression_evaluation_metrics&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Use-your-trained-model-to-make-predictions-and-see-explanations-of-the-results&quot;&gt;Use your trained model to make predictions and see explanations of the results&lt;a class=&quot;anchor-link&quot; href=&quot;#Use-your-trained-model-to-make-predictions-and-see-explanations-of-the-results&quot;&gt; &lt;/a&gt;&lt;/h2&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h3 id=&quot;Deploy-your-model-and-get-predictions-+-explanations&quot;&gt;Deploy your model and get predictions + explanations&lt;a class=&quot;anchor-link&quot; href=&quot;#Deploy-your-model-and-get-predictions-+-explanations&quot;&gt; &lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Once your training job has finished, you can use your model to make predictions.&lt;/p&gt;
&lt;p&gt;With &lt;em&gt;online prediction&lt;/em&gt;, you can now request &lt;strong&gt;explanations&lt;/strong&gt; of the results, in the form of &lt;strong&gt;&lt;a href=&quot;https://cloud.google.com/automl-tables/docs/features#feat-imp&quot;&gt;local feature importance&lt;/a&gt;&lt;/strong&gt; calculations on the inputs. Local feature importance gives you visibility into how the features in a specific prediction request informed the resulting prediction.&lt;/p&gt;
&lt;p&gt;To get online predictions, we first need to &lt;strong&gt;deploy&lt;/strong&gt; the model.&lt;/p&gt;
&lt;p&gt;Note: see the &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/&quot;&gt;documentation&lt;/a&gt; for other prediction options including the ability to &lt;a href=&quot;link_to_blog_post&quot;&gt;export&lt;/a&gt; your custom model and run it in a container anywhere.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;deploy_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Deploy model.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deploy_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# synchronous check of operation status.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Model deployed. &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;It will take a while to deploy the model. &lt;strong&gt;Wait for the &lt;code&gt;deploy_model()&lt;/code&gt; call to finish&lt;/strong&gt; before proceeding with the rest of the notebook cells. You can track status in the Console UI as well.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deploy_model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Once the model is deployed, you can access it via the UI, or the API, to make online prediction requests.  These can include a request for &lt;a href=&quot;https://cloud.google.com/automl-tables/docs/features#feat-imp&quot;&gt;local feature importance&lt;/a&gt; calculations on the inputs, a newly-launched feature. Local feature importance gives you visibility into how the features in a specific prediction request informed the resulting prediction.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sd&quot;&gt;&amp;quot;&amp;quot;&amp;quot;Make a prediction.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;Prediction results:&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;bike_id&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;5373&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;day_of_week&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;3&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;end_latitude&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;51.52059681&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;end_longitude&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.116688468&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;end_station_id&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;68&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;euclidean&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3589.5146210024977&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;loc_cross&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;POINT(-0.07 51.52)POINT(-0.12 51.52)&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;max&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;44.6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;min&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;34.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;prcp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;ts&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;1480407420&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;start_latitude&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;51.52388&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;start_longitude&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.065076&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;start_station_id&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;445&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;temp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;38.2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;s2&quot;&gt;&amp;quot;dewp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;28.6&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Try running the prediction request first without, then with, the local feature importance calculations, to see the difference in the information that is returned. (The actual duration— that we're predicting— is 1200.)&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MODEL_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;We can plot the local feature importance values to get a visualization of which fields were most and least important for this particular prediction.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;plt&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;col_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;payload&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tables_model_column_info&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;col_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column_display_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feature_importance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;barh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;yticks&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_pos&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;    
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;You can see a similar graphic in the &lt;a href=&quot;https://pantheon.corp.google.com/automl-tables/&quot;&gt;Cloud Console Tables UI&lt;/a&gt; when you submit an &lt;code&gt;ONLINE PREDICTION&lt;/code&gt; and tick the &quot;Generate feature importance&quot; checkbox.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;The local feature importance calculations are specific to a given input instance.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Summary&quot;&gt;Summary&lt;a class=&quot;anchor-link&quot; href=&quot;#Summary&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;In this notebook, we showed how you can use the AutoML Tables client library to create datasets, train models, and get predictions from your trained model— and in particular, how you can get explanations of the results along with the predictions.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;h2 id=&quot;Appendix:-running-this-notebook-on-colab-(or-locally)&quot;&gt;Appendix: running this notebook on colab (or locally)&lt;a class=&quot;anchor-link&quot; href=&quot;#Appendix:-running-this-notebook-on-colab-(or-locally)&quot;&gt; &lt;/a&gt;&lt;/h2&gt;&lt;p&gt;It's possible to run this example on &lt;a href=&quot;https://colab.research.google.com/&quot;&gt;colab&lt;/a&gt;, but it takes a bit more setup. Do the following before you create the Tables client object or call the API.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://cloud.google.com/iam/docs/creating-managing-service-accounts&quot;&gt;Create a service account&lt;/a&gt;, give it the necessary &lt;em&gt;roles&lt;/em&gt; (e.g., AutoML Admin) and &lt;a href=&quot;https://cloud.google.com/iam/docs/creating-managing-service-account-keys&quot;&gt;download a json credentials file&lt;/a&gt; for the service account.  &lt;strong&gt;Upload&lt;/strong&gt; the credentials file to the colab file system.&lt;/p&gt;
&lt;p&gt;Then, &lt;strong&gt;edit&lt;/strong&gt; the following to point to that file, and run the cell:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    
    
&lt;div class=&quot;cell border-box-sizing code_cell rendered&quot;&gt;
&lt;div class=&quot;input&quot;&gt;

&lt;div class=&quot;inner_cell&quot;&gt;
    &lt;div class=&quot;input_area&quot;&gt;
&lt;div class=&quot; highlight hl-ipython3&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;env&lt;/span&gt; GOOGLE_APPLICATION_CREDENTIALS /content/your-credentials-file.json
&lt;/pre&gt;&lt;/div&gt;

    &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    

&lt;div class=&quot;cell border-box-sizing text_cell rendered&quot;&gt;&lt;div class=&quot;inner_cell&quot;&gt;
&lt;div class=&quot;text_cell_render border-box-sizing rendered_html&quot;&gt;
&lt;p&gt;Your Tables API calls should now be properly authenticated.  If you lose the colab runtime, you'll need to re-upload the file and re-set the environment variable.&lt;/p&gt;
&lt;p&gt;If you're running the notebook locally, point the &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; environment variable to the service account credentials file before starting the notebook, e.g.:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/path/to/your-credentials-file.json
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html"></summary></entry></feed>