There’s been a lot happening with Google Cloud Dataflow lately.

We are pleased to announce the recent induction of the Google Cloud Dataflow SDK (and corresponding runners for Apache Flink and Apache Spark) into the new Apache Beam incubator project.

A ‘Streaming 102’ article was published by O’Reilly , following ‘Streaming 101’. These articles provide a great overview of design and implementation considerations in stream data analysis.

We’ve also recently written an article that compares the programming models of Dataflow and Spark as they exist today, based on a mobile ‘gaming’ scenario, involving the evolution of a pipeline from a simple batch use case to more sophisticated streaming use cases, with side-by-side code snippets contrasting the two. The article uses a suite of ‘gaming’ example pipelines that can be found in the Dataflow github repo.