This example shows how you can run a Cloud AI Platform Pipeline from a Google Cloud Function, thus providing a way for Pipeline runs to be triggered by events (in the interim before this is supported by Pipelines itself).

In this example, the function is triggered by the addition of or update to a file in a Google Cloud Storage (GCS) bucket, but Cloud Functions can have other triggers too (including Pub/Sub-based triggers).

The example is Google Cloud Platform (GCP)-specific, and requires a Cloud AI Platform Pipelines installation using Pipelines version >= 0.4. To run this example as a notebook, click on one of the badges at the top of the page or see here.

(If you are instead interested in how to do this with a Kubeflow-based pipelines installation, see this notebook).

Setup

Create a Cloud AI Platform Pipelines installation

Follow the instructions in the documentation to create a Cloud AI Platform Pipelines installation.

Identify (or create) a Cloud Storage bucket to use for the example

Before executing the next cell, edit it to set the TRIGGER_BUCKET environment variable to a Google Cloud Storage bucket (create a bucket first if necessary). Do not include the gs:// prefix in the bucket name.

We'll deploy the GCF function so that it will trigger on new and updated files (blobs) in this bucket.

%env TRIGGER_BUCKET=REPLACE_WITH_YOUR_GCS_BUCKET_NAME

Give Cloud Function's service account the necessary access

First, make sure the Cloud Function API is enabled.

Cloud Functions uses the project's 'appspot' acccount for its service account. It will have the form: PROJECT_ID@appspot.gserviceaccount.com. (This is also the project's App Engine service account).

Go to your project's IAM - Service Account page.
Find the PROJECT_ID@appspot.gserviceaccount.com account and copy its email address.
Find the project's Compute Engine (GCE) default service account (this is the default account used for the Pipelines installation). It will have a form like this: PROJECT_NUMBER@developer.gserviceaccount.com. Click the checkbox next to the GCE service account, and in the 'INFO PANEL' to the right, click ADD MEMBER. Add the Functions service account (PROJECT_ID@appspot.gserviceaccount.com) as a Project Viewer of the GCE service account.

Next, configure your TRIGGER_BUCKET to allow the Functions service account access to that bucket.

Navigate in the console to your list of buckets in the Storage Browser.
Click the checkbox next to the TRIGGER_BUCKET. In the 'INFO PANEL' to the right, click ADD MEMBER. Add the service account (PROJECT_ID@appspot.gserviceaccount.com) with Storage Object Admin permissions. (While not tested, giving both Object view and create permissions should also suffice).

Create a simple GCF function to test your configuration

First we'll generate and deploy a simple GCF function, to test that the basics are properly configured.

%%bash
mkdir -p functions

We'll first create a requirements.txt file, to indicate what packages the GCF code requires to be installed. (We won't actually need kfp for this first 'sanity check' version of a GCF function, but we'll need it below for the second function we'll create, that deploys a pipeline).

%%writefile functions/requirements.txt
kfp

Next, we'll create a simple GCF function in the functions/main.py file:

%%writefile functions/main.py
import logging

def gcs_test(data, context):
  """Background Cloud Function to be triggered by Cloud Storage.
     This generic function logs relevant data when a file is changed.

  Args:
      data (dict): The Cloud Functions event payload.
      context (google.cloud.functions.Context): Metadata of triggering event.
  Returns:
      None; the output is written to Stackdriver Logging
  """

  logging.info('Event ID: {}'.format(context.event_id))
  logging.info('Event type: {}'.format(context.event_type))
  logging.info('Data: {}'.format(data))
  logging.info('Bucket: {}'.format(data['bucket']))
  logging.info('File: {}'.format(data['name']))
  file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])
  logging.info('Using file uri: %s', file_uri)

  logging.info('Metageneration: {}'.format(data['metageneration']))
  logging.info('Created: {}'.format(data['timeCreated']))
  logging.info('Updated: {}'.format(data['updated']))

Deploy the GCF function as follows. (You'll need to wait a moment or two for output of the deployment to display in the notebook). You can also run this command from a notebook terminal window in the functions subdirectory.

%%bash
cd functions
gcloud functions deploy gcs_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize

After you've deployed, test your deployment by adding a file to the specified TRIGGER_BUCKET. You can do this easily by visiting the Storage panel in the Cloud Console, clicking on the bucket in the list, and then clicking on Upload files in the bucket details view.

Then, check in the logs viewer panel (https://console.cloud.google.com/logs/viewer) to confirm that the GCF function was triggered and ran correctly. You can select 'Cloud Function' in the first pulldown menu to filter on just those log entries.

Deploy a Pipeline from a GCF function

Next, we'll create a GCF function that deploys an AI Platform Pipeline when triggered. First, preserve your existing main.py in a backup file:

%%bash
cd functions
mv main.py main.py.bak

Then, before executing the next cell, edit the HOST variable in the code below. You'll replace <your_endpoint> with the correct value for your installation.

To find this URL, visit the Pipelines panel in the Cloud Console.
From here, you can find the URL by clicking on the SETTINGS link for the Pipelines installation you want to use, and copying the 'host' string displayed in the client example code (prepend https:// to that string in the code below).
You can alternately click on OPEN PIPELINES DASHBOARD for the Pipelines installation, and copy that URL, removing the /#/pipelines suffix.

%%writefile functions/main.py
import logging
import datetime
import logging
import time
 
import kfp
import kfp.compiler as compiler
import kfp.dsl as dsl
 
import requests
 
# TODO: replace with your Pipelines endpoint URL
HOST = 'https://<your_endpoint>.pipelines.googleusercontent.com'

@dsl.pipeline(
    name='Sequential',
    description='A pipeline with two sequential steps.'
)
def sequential_pipeline(filename='gs://ml-pipeline-playground/shakespeare1.txt'):
  """A pipeline with two sequential steps."""
  op1 = dsl.ContainerOp(
      name='filechange',
      image='library/bash:4.4.23',
      command=['sh', '-c'],
      arguments=['echo "%s" > /tmp/results.txt' % filename],
      file_outputs={'newfile': '/tmp/results.txt'})
  op2 = dsl.ContainerOp(
      name='echo',
      image='library/bash:4.4.23',
      command=['sh', '-c'],
      arguments=['echo "%s"' % op1.outputs['newfile']]
      )
 
def get_access_token():
  url = 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token'
  r = requests.get(url, headers={'Metadata-Flavor': 'Google'})
  r.raise_for_status()
  access_token = r.json()['access_token']
  return access_token
 
def hosted_kfp_test(data, context):
  logging.info('Event ID: {}'.format(context.event_id))
  logging.info('Event type: {}'.format(context.event_type))
  logging.info('Data: {}'.format(data))
  logging.info('Bucket: {}'.format(data['bucket']))
  logging.info('File: {}'.format(data['name']))
  file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])
  logging.info('Using file uri: %s', file_uri)
  
  logging.info('Metageneration: {}'.format(data['metageneration']))
  logging.info('Created: {}'.format(data['timeCreated']))
  logging.info('Updated: {}'.format(data['updated']))
  
  token = get_access_token() 
  logging.info('attempting to launch pipeline run.')
  ts = int(datetime.datetime.utcnow().timestamp() * 100000)
  client = kfp.Client(host=HOST, existing_token=token)
  compiler.Compiler().compile(sequential_pipeline, '/tmp/sequential.tar.gz')
  exp = client.create_experiment(name='gcstriggered')  # this is a 'get or create' op
  res = client.run_pipeline(exp.id, 'sequential_' + str(ts), '/tmp/sequential.tar.gz',
                              params={'filename': file_uri})
  logging.info(res)

Next, deploy the new GCF function. As before, it will take a moment or two for the results of the deployment to display in the notebook.

%%bash
cd functions
gcloud functions deploy hosted_kfp_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize

Add another file to your TRIGGER_BUCKET. This time you should see both GCF functions triggered. The hosted_kfp_test function will deploy the pipeline. You'll be able to see it running at your Pipeline installation's endpoint, https://<your_endpoint>.pipelines.googleusercontent.com/#/pipelines, under the given Pipelines Experiment (gcstriggered as default).

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Introduction

Google Cloud’s AutoML Tables lets you automatically build and deploy state-of-the-art machine learning models using your own structured data.

AutoML Tables now has an easier-to-use Tables-specific Python client library, as well as a new ability to explain online prediction results— called local feature importance— which gives visibility into how the features in a specific prediction request informed the resulting prediction. You can read more about explainable AI for Tables in this blog post.

The source for this post is a Jupyter notebook. In this notebook, we'll create a custom Tables model to predict duration of London bike rentals given information about local weather as well as info about the rental trip. We'll walk through examples of using the Tables client libraries for creating a dataset, training a custom model, deploying the model, and using it to make predictions; and show how you can programmatically request local feature importance information.

We recommend running this notebook using AI Platform Notebooks. If you want to run the notebook on colab (or locally), it's possible, but you'll need to do a bit more setup. See the Appendix section of this notebook for details.

Before you begin

Follow the AutoML Tables documentation to:

Select or create a GCP project.
Make sure that billing is enabled for your project
Enable the Cloud AutoML and Storage APIs.
(Recommended) Create an AI Platform Notebook instance and upload this notebook to it.

(See also the Quickstart guide for a getting-started walkthrough on AutoML Tables).

Then, install the AutoML Python client libraries into your notebook environment:

!pip3 install -U google-cloud-automl

You may need to restart your notebook kernel after running the above to pick up the installation.

Enter your GCP project ID in the cell below, then run the cell.

PROJECT_ID = "<your-project-id>"

Do some imports

Next, import some libraries and set some variables.

import argparse
import os
from google.api_core.client_options import ClientOptions
from google.cloud import automl_v1beta1 as automl
import google.cloud.automl_v1beta1.proto.data_types_pb2 as data_types

REGION = 'us-central1'
DATASET_NAME = 'bikes-weather'
BIGQUERY_PROJECT_ID = 'aju-dev-demos'
DATASET_ID = 'london_bikes_weather'
TABLE_ID = 'bikes_weather'
IMPORT_URI = 'bq://%s.%s.%s' % (BIGQUERY_PROJECT_ID, DATASET_ID, TABLE_ID)
print(IMPORT_URI)

DATASET_NAME = 'bikes_weather'

Create a dataset, and import data

Next, we'll define some utility functions to create a dataset, and to import data into a dataset. The client.import_data() call returns an operation future that can be used to check for completion synchronously or asynchronously— in this case we wait synchronously.

def create_dataset(client, dataset_display_name):
    """Create a dataset."""

    # Create a dataset with the given display name
    dataset = client.create_dataset(dataset_display_name)

    # Display the dataset information.
    print("Dataset name: {}".format(dataset.name))
    print("Dataset id: {}".format(dataset.name.split("/")[-1]))
    print("Dataset display name: {}".format(dataset.display_name))
    print("Dataset metadata:")
    print("\t{}".format(dataset.tables_dataset_metadata))
    print("Dataset example count: {}".format(dataset.example_count))
    print("Dataset create time:")
    print("\tseconds: {}".format(dataset.create_time.seconds))
    print("\tnanos: {}".format(dataset.create_time.nanos))

    return dataset

def import_data(client, dataset_display_name, path):
    """Import structured data."""
 
    response = None
    if path.startswith('bq'):
        response = client.import_data(
            dataset_display_name=dataset_display_name, bigquery_input_uri=path
        )
    else:
        # Get the multiple Google Cloud Storage URIs.
        input_uris = path.split(",")
        response = client.import_data(
            dataset_display_name=dataset_display_name,
            gcs_input_uris=input_uris
        )

    print("Processing import...")
    # synchronous check of operation status.
    print("Data imported. {}".format(response.result()))

Next, we'll create the client object that we'll use for all our operations.

client = automl.TablesClient(project=PROJECT_ID, region=REGION)

Create the Tables dataset:

create_dataset(client, DATASET_NAME)

... and then import data from the BigQuery table into the dataset. The import command will take a while to run. Wait until it has returned before proceeding. You can also check import status in the Cloud Console.

(Note that if you run this notebook multiple times, you will get an error if you try to create multiple datasets with the same name. However, you can train multiple models against the same dataset.)

import_data(client, DATASET_NAME, IMPORT_URI)

Update the dataset schema

Now we'll define utility functions to update dataset and column information. We need these to set the dataset's target column (the field we'll train our model to predict) and to change the types of some of the columns. AutoML Tables is pretty good at inferring reasonable column types based on input, but in our case, there are some columns (like bike station IDs) that we want to treat as Categorical instead of Numeric.

def update_column_spec(client,
                       dataset_display_name,
                       column_spec_display_name,
                       type_code,
                       nullable=None):
    """Update column spec."""

    response = client.update_column_spec(
        dataset_display_name=dataset_display_name,
        column_spec_display_name=column_spec_display_name,
        type_code=type_code, nullable=nullable
    )

    # synchronous check of operation status.
    print("Table spec updated. {}".format(response))
    
def update_dataset(client,
                   dataset_display_name,
                   target_column_spec_name=None,
                   time_column_spec_name=None,
                   test_train_column_spec_name=None):
    """Update dataset."""

    if target_column_spec_name is not None:
        response = client.set_target_column(
            dataset_display_name=dataset_display_name,
            column_spec_display_name=target_column_spec_name
        )
        print("Target column updated. {}".format(response))
    if time_column_spec_name is not None:
        response = client.set_time_column(
            dataset_display_name=dataset_display_name,
            column_spec_display_name=time_column_spec_name
        )
        print("Time column updated. {}".format(response))

def list_column_specs(client,
                      dataset_display_name,
                      filter_=None):
    """List all column specs."""
    result = []

    # List all the table specs in the dataset by applying filter.
    response = client.list_column_specs(
        dataset_display_name=dataset_display_name, filter_=filter_)

    print("List of column specs:")
    for column_spec in response:
        # Display the column_spec information.
        print("Column spec name: {}".format(column_spec.name))
        print("Column spec id: {}".format(column_spec.name.split("/")[-1]))
        print("Column spec display name: {}".format(column_spec.display_name))
        print("Column spec data type: {}".format(column_spec.data_type))

        result.append(column_spec)

    return result

Update the dataset to indicate that the target column is duration.

update_dataset(client, DATASET_NAME,
                target_column_spec_name='duration',
#                 time_column_spec_name='ts'
              )

Now we'll update some of the column types. You can list their default specs first if you like:

list_column_specs(client, DATASET_NAME)

... and now we'll update them to the types we want:

update_column_spec(client, DATASET_NAME,
                   'end_station_id',
                    'CATEGORY')
update_column_spec(client, DATASET_NAME,
                   'start_station_id',
                    'CATEGORY')
update_column_spec(client, DATASET_NAME,
                   'loc_cross',
                   'CATEGORY')
update_column_spec(client, DATASET_NAME,
                   'bike_id',
                   'CATEGORY')

You can view the results in the Cloud Console. Note that useful stats are generated for each column. You can also run the list_column_specs() function again to see the new config.

# list_column_specs(client, DATASET_NAME)

Train a custom model on the dataset

Now we're ready to train a model on the dataset. We'll need to generate a unique name for the model, which we'll do by appending a timestamp, in case you want to run this notebook multiple times. The 1000 arg in the create_model() call specifies to budget 1 hour of training time.

In the create_model() utility function below, we may not want to block on the result, since total job time can be multiple hours. If you want the function to block until training is complete, uncomment the last line of the function below.

import time
MODEL_NAME = 'bwmodel_' + str(int(time.time()))
print('MODEL_NAME: %s' % MODEL_NAME)

def create_model(client,
                 dataset_display_name,
                 model_display_name,
                 train_budget_milli_node_hours,
                 include_column_spec_names=None,
                 exclude_column_spec_names=None):
    """Create a model."""
 
    # Create a model with the model metadata in the region.
    response = client.create_model(
        model_display_name,
        train_budget_milli_node_hours=train_budget_milli_node_hours,
        dataset_display_name=dataset_display_name,
        include_column_spec_names=include_column_spec_names,
        exclude_column_spec_names=exclude_column_spec_names,
    )

    print("Training model...")
    print("Training operation: {}".format(response.operation))
    print("Training operation name: {}".format(response.operation.name))
    # uncomment the following to block until training is finished.
    # print("Training completed: {}".format(response.result()))

create_model(client, DATASET_NAME, MODEL_NAME, 1000)

Get the status of your training job

Edit the following call to set OP_NAME to the "training operation name" listed in the output of create_model() above.

OP_NAME = 'YOUR TRAINING OPERATION NAME'

def get_operation_status(client, operation_full_id):
    """Get operation status."""
 
    # Get the latest state of a long-running operation.
    op = client.auto_ml_client.transport._operations_client.get_operation(
        operation_full_id
    )

    print("Operation status: {}".format(op))
    from google.cloud.automl import types
    msg = types.OperationMetadata()
    print(msg.ParseFromString(op.metadata.value))

The training job may take several hours. You can check on its status in the Cloud Console UI. You can also monitor it via the get_operation_status() call below. (Make sure you've edited the OP_NAME variable value above). You'll see: done: true in the output when it's finished.

(Note: if you should lose your notebook kernel context while the training job is running, you can continue the rest of the notebook later with a new kernel: just make note of the MODEL_NAME. You can find that information in the Cloud Console as well).

res = get_operation_status(client, OP_NAME)

Get information about your trained custom model

Once it has been created, you can get information about a specific model. (While the training job is still running, you'll just get a not found message.)

from google.cloud.automl_v1beta1 import enums
from google.api_core import exceptions

def get_model(client, model_display_name):
    """Get model details."""

    try:
        model = client.get_model(model_display_name=model_display_name)
    except exceptions.NotFound:
        print("Model %s not found." % model_display_name)
        return (None, None)

    # Get complete detail of the model.a
    model = client.get_model(model_display_name=model_display_name)

    # Retrieve deployment state.
    if model.deployment_state == enums.Model.DeploymentState.DEPLOYED:
        deployment_state = "deployed"
    else:
        deployment_state = "undeployed"

    # get features of top global importance
    feat_list = [
        (column.feature_importance, column.column_display_name)
        for column in model.tables_model_metadata.tables_model_column_info
    ]
    feat_list.sort(reverse=True)
    if len(feat_list) < 10:
        feat_to_show = len(feat_list)
    else:
        feat_to_show = 10

    # Display the model information.
    print("Model name: {}".format(model.name))
    print("Model id: {}".format(model.name.split("/")[-1]))
    print("Model display name: {}".format(model.display_name))
    print("Features of top importance:")
    for feat in feat_list[:feat_to_show]:
        print(feat)
    print("Model create time:")
    print("\tseconds: {}".format(model.create_time.seconds))
    print("\tnanos: {}".format(model.create_time.nanos))
    print("Model deployment state: {}".format(deployment_state))

    return (model, feat_list)

Don't proceed with the rest of the notebook until the model has finished training and the following get_model() call returns model information rather than 'not found'.

Once the training job has finished, we can get information about the model, including information about which input features proved to be the most important globally (that is, across the full training dataset).

(model, global_feat_importance) = get_model(client, MODEL_NAME)

We can graph the global feature importance values to get a visualization of which inputs were most important in training the model. (The Cloud Console UI also displays such a graph).

print(global_feat_importance)

import matplotlib.pyplot as plt

res = list(zip(*global_feat_importance))
x = list(res[0])
y = list(res[1])

y_pos = list(range(len(y)))
plt.barh(y_pos, x, alpha=0.5)
plt.yticks(y_pos, y)
plt.show()

See your model's evaluation metrics

We can also get model evaluation information once the model is trained. The available metrics depend upon which optimization objective you used. In this example, we used the default, RMSE.

evals = client.list_model_evaluations(model_display_name=MODEL_NAME)
list(evals)[1].regression_evaluation_metrics

Use your trained model to make predictions and see explanations of the results

Deploy your model and get predictions + explanations

Once your training job has finished, you can use your model to make predictions.

With online prediction, you can now request explanations of the results, in the form of local feature importance calculations on the inputs. Local feature importance gives you visibility into how the features in a specific prediction request informed the resulting prediction.

To get online predictions, we first need to deploy the model.

Note: see the documentation for other prediction options including the ability to export your custom model and run it in a container anywhere.

def deploy_model(client, model_display_name):
    """Deploy model."""

    response = client.deploy_model(model_display_name=model_display_name)
    # synchronous check of operation status.
    print("Model deployed. {}".format(response.result()))

It will take a while to deploy the model. Wait for the deploy_model() call to finish before proceeding with the rest of the notebook cells. You can track status in the Console UI as well.

deploy_model(client, MODEL_NAME)

Once the model is deployed, you can access it via the UI, or the API, to make online prediction requests. These can include a request for local feature importance calculations on the inputs, a newly-launched feature. Local feature importance gives you visibility into how the features in a specific prediction request informed the resulting prediction.

def predict(client,
            model_display_name,
            inputs,
            feature_importance=False):
    """Make a prediction."""

    if feature_importance:
        response = client.predict(
            model_display_name=model_display_name,
            inputs=inputs,
            feature_importance=True,
        )
    else:
        response = client.predict(
            model_display_name=model_display_name,
            inputs=inputs)
    print("Prediction results:")
    print(response)
    return response

inputs =  {
      "bike_id": "5373",
      "day_of_week": "3",
      "end_latitude": 51.52059681,
      "end_longitude": -0.116688468,
      "end_station_id": "68",
      "euclidean": 3589.5146210024977,
      "loc_cross": "POINT(-0.07 51.52)POINT(-0.12 51.52)",
      "max": 44.6,
      "min": 34.0,
      "prcp": 0,
      "ts": "1480407420",
      "start_latitude": 51.52388,
      "start_longitude": -0.065076,
      "start_station_id": "445",
      "temp": 38.2,
      "dewp": 28.6
    }

Try running the prediction request first without, then with, the local feature importance calculations, to see the difference in the information that is returned. (The actual duration— that we're predicting— is 1200.)

predict(client, MODEL_NAME, inputs, feature_importance=False)

response = predict(client, MODEL_NAME, inputs, feature_importance=True)

We can plot the local feature importance values to get a visualization of which fields were most and least important for this particular prediction.

import matplotlib.pyplot as plt

col_info = response.payload[0].tables.tables_model_column_info
x = []
y = []
for c in col_info:
  y.append(c.column_display_name)
  x.append(c.feature_importance)
y_pos = list(range(len(y)))
plt.barh(y_pos, x, alpha=0.5)
plt.yticks(y_pos, y)
plt.show()

You can see a similar graphic in the Cloud Console Tables UI when you submit an ONLINE PREDICTION and tick the "Generate feature importance" checkbox.

The local feature importance calculations are specific to a given input instance.

Summary

In this notebook, we showed how you can use the AutoML Tables client library to create datasets, train models, and get predictions from your trained model— and in particular, how you can get explanations of the results along with the predictions.

Appendix: running this notebook on colab (or locally)

It's possible to run this example on colab, but it takes a bit more setup. Do the following before you create the Tables client object or call the API.

Create a service account, give it the necessary roles (e.g., AutoML Admin) and download a json credentials file for the service account. Upload the credentials file to the colab file system.

Then, edit the following to point to that file, and run the cell:

%env GOOGLE_APPLICATION_CREDENTIALS /content/your-credentials-file.json

Your Tables API calls should now be properly authenticated. If you lose the colab runtime, you'll need to re-upload the file and re-set the environment variable.

If you're running the notebook locally, point the GOOGLE_APPLICATION_CREDENTIALS environment variable to the service account credentials file before starting the notebook, e.g.:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-credentials-file.json

Amy on GCP

Use Vertex Pipelines to build an AutoML Classification end-to-end workflow

Introduction

Vertex AI and Vertex Pipelines

An end-to-end AutoML Workflow with Vertex Pipelines

Defining a custom component

Sharing component specifications

Running a pipeline job on Vertex Pipelines

Leveraging Pipeline step caching to develop and debug

Lineage tracking

What’s next?

Event-triggered Kubeflow Pipeline runs, and using TFDV to detect data drift

Introduction

Running the example notebook

Creating TFDV-based KFP components

Defining a pipeline that uses the TFDV components

Instantiate pipeline ops from the components

Event-triggered pipeline runs

Set up a GCF function to trigger a pipeline run when a dataset is updated

Define the GCF function

Trigger a pipeline run when new data becomes available

Summary

Keras Tuner KFP example, part II— creating a lightweight component for metrics evaluation

Introduction

Setup

Create an AI Platform Notebooks instance

Install the KFP SDK

Defining a new ‘lightweight component’ based on a python function

Define a pipeline that uses the new “metrics” op

Use the new “metrics” op with the full Keras Tuner pipeline

More detail on the code, and requesting predictions from your model

Running a distributed Keras HP Tuning search using Kubeflow Pipelines

Introduction

About the dataset and modeling task

The dataset

The modeling task and Keras model

Keras tuner in distributed mode on GKE with preemptible VMs

Defining the HP Tuning + training workflow as a pipeline

Running the example pipeline

What’s next?

Training an AutoML Tables model with BigQuery ML

Introduction

About the dataset and modeling task

Specifying training, eval, and test datasets

Tables schema configuration and BQML

Training the AutoML Tables model via BQML

Evaluating your trained custom model

Did our schema hints help?

Using your BQML AutoML Tables model for prediction

Summary

Using Google Cloud Functions to support event-based triggering of Cloud AI Platform Pipelines

Setup

Create a Cloud AI Platform Pipelines installation

Identify (or create) a Cloud Storage bucket to use for the example

Give Cloud Function's service account the necessary access

Create a simple GCF function to test your configuration

Deploy a Pipeline from a GCF function

Creating an AutoML Tables end-to-end workflow on Cloud AI Platform Pipelines

Introduction

About the example dataset and scenario

Using Cloud AI Platform Pipelines or Kubeflow Pipelines to orchestrate a Tables workflow

Install a Cloud AI Platform Pipelines cluster

Or, install Kubeflow to use Kubeflow Pipelines

Upload and run the Tables end-to-end Pipeline

The steps executed by the pipeline

Create a Tables dataset and adjust its schema

Train a custom model on the dataset

View model search information via Cloud Logging

Custom model evaluation

(Conditional) model deployment

Putting it together: The full pipeline execution

Getting explanations about your model’s predictions

The AutoML Tables UI in the Cloud Console

Export the trained model and serve it on a GKE cluster

Send prediction requests to your deployed model service

A deeper dive into the pipeline code

Using the ‘lightweight python components’ functionality to build pipeline steps

Specifying the Tables pipeline

Getting explanations for AutoML Tables predictions

Introduction