27 March 2014

What you should know about the Google Cloud Platform

GCPLive Event Summary & Review



The Google Cloud Platform team is making a major marketing push to spread the word about GCP. On March 25, 2014 they held GCPLive in San Francisco, the first of a 27-city global roadshow to highlight their cloud platform. As the developer and provider of a modest web application ( “Majozi” ) which does automatic duty-roster scheduling, I attended GCPLive because I am interested in GCP as a possible home for my app. Bottom line: I will definitely be giving GCP a try for my application.
This review of GCPLive will consist of: a description about the uniqueness and strong points of GCP, an assessment of some possible weaknesses (“challenges?”) of GCP, and a critique of the event itself. The perspective here is written from my viewpoint, as an engineer and as a potential customer.
Background
I am not a high-end huge enterprise developer. I have had experience with production apps on EngineYard, Heroku, Amazon, and other PAAS hosts. Agility and ultra rapid prototyping are key to me. I have a low tolerance for clumsy interfaces, awful documentation, and inability to support DRYness[1].
Majozi has an interesting backend called the Rostering Engine, which assigns people to duty assignments when creating a duty roster typically for a 3-month period and about 300+ roster duties. Rostering[2] is an NP Hard problem involving a massive combinatorial space on the order of 10 to the 20th power. You can understand my interest in tapping a scalable computing space, even though not even Google’s infrastructure could possibly evaluate every possible roster combination within the timeframe of the existence of our Solar System!


What is unique about GCP?

The marketplace already has several IAAS cloud platforms: Amazon’s EC2, Rackspace; IBM cloud server, and Microsoft. And it has several PAAS providers: Heroku, EngineYard, etc many of which run on the mentioned IAAS providers.




Is Google’s Cloud Platform (GCP) just another cloud service? 
Hardly. To view it as that is to totally miss the point and the potential. To say that GCP is competing with EC2 is to me laughable. Media and tech bloggers who merely compare the two as equals in the same foot race are short sighted.

Google’s internal platform with the Google toolbox exposed



What’s different? GCP is more than just an infrastructure highway like EC2. Much more. GCP’s overview page clearly states the hands down winning argument for using GCP: It runs on Google’s infrastructure. This infrastructure “returns billions of search results in milliseconds, serve[s] 6 billion hours of YouTube video per month and provide[s] storage for 425 million Gmail users.” That’s a powerful track record.
But David, you say, doesn’t Amazon’s amazing and powerful e-commerce prowess also run on its EC2? (Actually, I don’t recall Amazon ever making that claim, so correct me if I’m wrong). Even if that were the case, it is not in the same computational league as the things which Google Services achieve every second throughout the global interwebs and with more massive amounts of data.
This GCP infrastructure depends on four key components: Global Network, Storage, Redundancy, and Cutting-edge computer science services.

Why wait for the next Google Whitepaper to be turned into a Hadoop?

It’s this final point that I want to highlight as putting GCP in a league of its own. “Google has led the industry with innovations in software infrastructure such as MapReduce, BigTable and Dremel. Today, Google is pushing the next generation of innovation with products such as Spanner and Flume. When you build on Cloud Platform, you get access to Google’s technology innovations faster.” Meaning, you won’t have to wait for a Google whitepaper to be turned into another Hadoop to take advantage of cutting-edge technology.
This is the real kicker and something that none of the other providers can give. GCPLive gave demonstrations of some of those capabilities. It’s being widely acknowledged that the immediate future of applications -- mobile and otherwise -- lies in contextual awareness and predictive responses. As Google exposes more of its innovations into GCP, developers will better be able to use the same building blocks to give their applications similar capabilities. That is a strongly compelling reason to choose GCP for any interesting project.
Already GCP offers APIs for Translate (over 1000 language pairs) and Predictive. I can envision that eventually BigBrain (the deep neural net capability used for image recognition and speech recognition) will be one of the GCP available services. That’s a powerful incentive to start putting even middling applications on GCP now. Imagine what your engineers could offer your customers if they have access to GCP’s predictive, contextual, and signal recognition capabilities? Even if that’s not now in the current market requirements for an application, it soon will be.

Failure is the norm

Google’s infrastructure has been industry leading for the last ten years: its data centers are the best, its reliability beats everyone else, its global speed and responsiveness and massive scaling beat everyone else. GCP brings that capability to any developer’s doorstep. GCP is the internal Google platform exposing the Google toolbox for developers.
Google’s infrastructure has been designed around the concept that “Failure is the norm.” Hardware is NOT the path to reliability; software is. They plan for failure instead of reacting to failure. For example, at the conference, they demonstrated Live Migration: the ability of the infrastructure to switch hardware nodes in an instant at the sign of a failure. The demo involved streaming an HD video which didn’t drop a frame during the switchover.
My web application runs on Heroku, a PAAS, which in turn runs on Amazon’s EC2 IAAS. Over the last three years, Amazon has had at least three major breakdowns and cloud failures, one of which lasted for over 12 hours, making my (and many other more prominent) applications go dark. EC2 is good; GCP is great. One of EC2 failures was so clumsy and severe, it forced EngineYard to add in its own ability to dynamically shift any of its customer’s applications between EC2 data centers. That’s a strong admission that they cannot trust EC2’s ability to cope with failure. And isn’t that supposed to be part of an IAAS?

Quick overview of the GCP offering

The Google Cloud Platform website contains a better description with fancier graphics, so I’m just going to cover a few points to illustrate scope.

GCP: compute, storage, networking, services

Compute
GCP offers a continuum between flexibility (IAAS) and agility (PAAS), with four different gradations between. For those needing full flexibility to define the environment, one can set up a VM to taste, and use that image when scaling. 
GCP brings the ability for automatic scaling to massive size to meet demand. This has been covered in other press releases demonstrating the ability to scale within a few seconds to dynamically handle changing query loads up to 1m qps.
Consistent performance is also a given. Currently, my middling app on Heroku suffers from extreme swings in responsiveness, depending upon what other applications are doing on the shared slice. Google engineers have smoothed out those wrinkles for consistent and reliable responsiveness no matter what the load is.

Storage

Google’s Cloud Store, Cloud SQL, and NoSQL Cloud Datastore offer a full range of choices. The conference demonstrated the ability to continuously add 100K+ rows per second without affecting realtime performance analysis of the resulting data using BigQuery.
Networking
This takes advantages of many of Google’s internal innovations: load balancing and Google’s own cloud DNS, Google’s own fiber network between data centers (and now encrypted!).
Services
This is the especially exciting part: BigQuery, Cloud Endpints for RESTful application interfaces, a translate API, and a predictive API. GCP also offers free & fast connects to all Google Services.
Green
Google is carbon-neutral, invests heavily in carbon-alternative energy sources, and has the industry-leading PUE for its data centers (amount of energy that goes into the actual server computing vs overall data center energy usage including cooling).

Room for improvement



GCPLive introduced many new features, especially a simplified makes-sense pricing structure, pricing reductions, and alternate levels between IAAS and PAAS involving accessible Virtual Machines (VMs).
It’s obvious that I am enthusiastic about GCP in general and intend to begin porting over my application on an experimental basis. In this section, however, I will point out a few areas that I think the GCP could improve. I do not expect my opinions to be universal amongst all developers, nor do I expect GCP to adopt all of the suggestions. I would be pleased if the thrust of these suggestions possibly reveals some blind spots in GCP engineer’s thinking and assumptions and they begin trying to correct that.

App Engine languages



Unsurprisingly enough, three of the App Engine’s four standard languages are Google’s own internal bias: Java, Python, and Go. PHP rounds out the fourth. But little Heroku offers: Ruby, Java, Node.js, Python, Clojure, and Scala, many of which are used in modern rapidly prototyped applications. At the very least (ur hmm, my bias), Ruby should be part of GCP’s standard mix, (but don’t lock in the versions!).


Practical progamming's state-of-the-art



Why? Ruby and especially Ruby on Rails has led practical programming innovation over the last several years. The Ruby community has made AGILE and DRY development methodologies the expected norm; they strongly incorporate a full range of test structures; have standard structures (RVM, bundler) to partition and specify ruby and gem[3] versions required for a given project; have evolved Rack and Metal, pluggable middleware frameworks for applications; and have popularized rapid prototyping with Rails, a standard web application framework that influenced similar frameworks for PHP, Perl, and other languages.
For anyone serious about computer science, Ruby and Ruby-on-Rails have made numerous important contributions to the art and practice of robust software engineering.
Computer Science theory needs to be coupled with industry-leading software engineering methodologies to produce great systems. It appeared that Google engineers (at least the ones I encountered on Tuesday), weren’t aware of this prior art for practical engineering. GCP -- both the App Engine (PAAS) and the Compute Engine (IAAS) would benefit from this influx of nutrients from the Ruby & Rails worlds.

GCP App Engine vs Heroku capability comparison

In particular, PAAS-provider Heroku has pioneered easy-to-use but extremely flexible cloud platform usage on their Cedar Stack. Let’s look at a few of these capabilities (warning, non-comprehensive list approaching):
  • a CLI toolkit (‘gcloud’ vs ‘heroku’ toolkits); kudos to GCP team, but the GCP primitives could be richer.
  • CLI should be cron-able (heroku has become weak on this; GCP is unclear about this).
  • git-to-deploy automation (which heroku has had since 2008); GCP also has and richly supports ability to change production on the fly. Big kudos to the GCP team for this.
  • dashboard (kudos, GCPs is way better than heroku’s)
  • add-ins (GCP is way behind heroku’s rich set of add-in partners) for email, SMS, databases, log monitoring, exception handling, SSL, DNS, etc. Add-ins are part of a DRY mentality for rapidly developing rugged applications.
  • data backups? recovery? importing/exporting? mirroring between production and staging? all this is unclear in GCP. It might be there and I haven’t uncovered it yet. In heroku, it’s very clear and an integral part of the offering.  Here, I’m not so worried about Google losing my data as much as the need for recovery from a user or a developer error. As a developer, I like the ability to grab the relevant 12-hr backup to debug an error locally on my dev machine. These stretch back over two weeks.
  • Postgres. Google is dedicated to open source and postgres is one of the most robust and high-performance SQL-compliant open source DBMS around. Having used both MySQL (GCPs Cloud SQL basis) and Postgres in production systems, I’ve encountered far fewer “issues” and quirks with Postgres. Heroku’s cloud implementation of Postgres is so advanced they have made it a DAAS (database as a service) offering. I realize that there are many major applications running on MySQL. Having both as part of CloudSQL would be good. I understand that I can provision a VM with postgres, but then the built-in backup situation is unclear and probably rests on my shoulders. Not very DRY.
  • Background queue processing: heroku automatically supports several background queue worker methodologies. I can seamlessly specify background workers for my background queue and know that the queue will be handled correctly. It just works, out-of-the-box. DRY.
  • A full scope of capabilities for PAAS. Here, heroku excels; see more>>. In time, I hope to see GCP moving in this direction of usability and completeness. IAAS flexibility is nice, but DRY PAAS agility means faster time-to-market. For startups, this could be a crucial choice. Innovative startups choosing GCP and then bursting into success will drive adoption of GCP more than trying to persuade old school enterprises to forego their own data centers. Google’s DNA is that of start-ups.

Other areas of improvement

  • bundler-like capability to automatically specify and lock in component versions for a project; GCP’s roll-your-own (reinvent a wheel) approach doesn’t make sense when the open source prior-art is so advanced.
  • database backup & utilities, both inter-cloud, intra-cloud, and cloud-to-local
  • rack-like standard middleware framework
  • staged rollouts for new versions (thus obviating a need for maintenance mode)
  • A/B testing
  • better clarity about suggested production, staging, test, and development environments and seamlessly switching between any.

GCP Live Event



This is my critique of the event itself, so we’re switching gears from engineering to marketing!

Overall

I was thrilled to be able to attend and want to thank Google for the great hospitality, the food, and the event itself. Seeing it live is so much better than on-line streaming. All of the Google people and the event staff were kind, gracious, and enthusiastic. I was particularly pleased whenever a Google engineer showed a genuine interest in the Majozi Rostering Engine and its particular way of dealing with combinatorial complexity.
The high level of all the presentations was amazing. I live in Oakland and keep programmer’s hours (typically finishing at 2am and waking at 10am), so getting up at 6am to attend the event was the middle of my night! My nature is that when I get bored, I immediately fall asleep. Despite being sleep-deprived, this didn’t happen. I had to stay super alert to catch the rapid-fire information, terminology, and meaning during the presentations. Well done GCP team!
Special thanks to Googlers Eric Johnson, Andrew Jessup, Bill (kernel guy introduced to me by Benjamin, the marketing guy), Brian Dorsey, X -- another engineer from Seattle (sorry forgot your name but you were very helpful). If you read this, please send me your names or Circle me on Google+.
Jeff Dean and Urs Holzle at Fireside Chat

Presentations

The entire keynote was great. Urs was great, the demos were awesome, the various portions were good. The presentations I attended were: Compute at Google, GCP and open source, New runtime support on App Engine, and the Fireside chat. The Fireside chat with Urs and Jeff Dean was awesome. Good questions and moderation by Fred.
Font-size on the presentations was too small. Guy Kawasaki gives some great tips on presentations and the 30-pt font rule should be followed.
I’m a big Google+ fan and user, but why no love for G+ live streaming posts during the event? Twitter only? Also would have liked more prominence for the GCP Google+ Page. I should have been live resharing some of the information instead of trying to type it. I do like the sound-bite posts that the Page has. The GCP Google+ Page should almost be a full media kit: photos, info, background, etc. Works for fans too.

Q&A’s

I loved the questions and answers. It was great not having media asking silly divisive click-bait questions as sometimes happens at Google I/O. The developers in attendance really were into the subject matter and were themselves at a high-level of skill. That made for a better overall conference.

Schedule & logistics

Overall, good, but allowing only five minutes between sessions is not enough, especially given the venue (see below). I was glad for the lunch and break times.
  • For some reason, my Attendee badge wasn’t prepared and ready, although my name was on the list. Having a handwritten badge, with a pen and not a Sharpie, was embarrassing throughout the day. Why not have the ability to print a badge label for those cases when a badge isn’t ready? CloudPrint anyone? Failure is the norm, right?
  • Badges: bigger font sizes please. Since the badge hangs so low, it was very hard to read people’s names and companies.
  • Recharging stations great! thanks
  • WiFi availability & usage: great! thanks!
  • Sound system (microphones in particular) had too many glitches. Needs more practice with live mics before the event.
  • GCP Marketing staff (kudos to Benjamin & Bryant) was great, friendly, and good at connecting people together.

Food

The food was wonderful and I’m glad it was easily vegetarian-oriented. Family-style seating and serving worked nicely and helped break the ice. Thanks. The brownies some of the best ever. Thanks also for the Tcho chocolates! The after-event party at Tank18 was good also.

Venue

Big minus on the venue: Terra Gallery was not appropriate.
  • Venue was too far off the main mass transit lines.
  • The waiting line to enter at 8:30 was right next to the incredibly powerful stench of the rotting garbage in the dumpsters. People were covering their noses, fanning themselves while waiting in line.
  • One small restroom only for each gender .. really? Maybe there was one upstairs but it wasn’t obvious. The ladies were happy, however, to finally see the men lined up in a long queue instead of them. That’s why five minutes between events wasn’t enough. Ten would have been better. And more restrooms.
  • Only one stairway to get from Stage 1 to Stage 2; yes, a second way opened up, but having to traverse outside, through the parking lot & elements, then back inside.
  • Coat/bag check would have been nice, especially since it was a cold, rainy day.
  • Pillars in the two event rooms obstructed the view too much.
  • Having a lounge area directly behind the Stage 1 audience area probably didn't work out as intended: the noise level was distracting from the presentations.
  • Main screen was too low; people’s heads obscured the lower ⅓ of presentations.
  • Speaker often stood in front of the screen, blocking significant portions if one was seated in the center section.
  • directional signs weren’t obvious enough; they blended too much into the GCP style and didn’t stand out. they were too low and didn’t have enough contrast to proclaim their information.
  • It escaped me that the keynote was going to be upstairs in Stage 1. The stairs to get there were part of the escape EXIT, so it didn’t seem like part of the venue.

Conclusion

I’m enthusiastic about GCP. I think it’s ready for prime time. I highly recommend it to anyone considering doing interesting applications. I plan on experimenting with porting my application to a GCP VM and seeing what it’s like to run it on GCP.
The GCP team should be more aware of solved and open source technologies often used in PAAS, such as coming from the Ruby community: bundler, RVM, Rails, Rack, etc etc.
GCP’s App Engine functionality could still learn much from Heroku and really needs to have Ruby & Rails support out-of-the-box.

And finally, if you’re anywhere near one of the 27 cities for the up-coming GCP roadshow, do attend. You’ll be glad you did!

Footnotes
1. “Don’t Repeat Yourself” coding mantra for reusable modularity & code context-aware macros.
2. Also called NSP (Nurse Scheduling Problem) in Computer Science literature.
3. Ruby Gems: community-developed open source modular code plug-ins & macros