The Commoditization of Machine Learning

I recently saw a tweet from Ilya Sukhar that particularly resonated with me:

I've been interested in this space for a while. A broad prediction I have for the coming years is that, as a developer, you won't need to be proficient in machine learning to take advantage of its power. The technology is becoming increasingly democratized and opening up access to millions of new developers. Eventually, you won't even need to know how to program to perform data analysis with ML. In data warehousing, data analysts using old, traditional BI stacks will have access to a powerful new set of machine learning tools. In fact, in the future, using ML may be more about manipulating data, rather than hard mathematics or statistics (h/t Wiley for the comparison). We're moving away from obscure mathematical derivatives to teaching surface area to 4th graders.

A close comparison of this advancement is the proliferation of web development as we know it today. The development of a web application looks a whole hell of a lot different than earlier in the internet days. Before, you needed a strong knowledge of TCP/IP, Solaris servers, Oracle databases, etc. to build a web application. Eventually, these tools were abstracted into frameworks (Perl, Ruby on Rails, Bootstrap) and tools (AWS, Heroku, Parse), making the process of building, deploying, and scaling much easier. Taking it one step further, tools are even being built for non-developers to build apps (Treeline being a good example).

In the machine learning world, we're moving away from the TCP/IP days into the Ruby on Rails days. With limited ML background, it's now much easier to build ML applications than it was even a few years ago. With the rapid development of new open source toolkits, we're truly seeing a rapid commoditization of the technology:

This helps match the rapid growth of the field: 

Publication dates of almost 15000 Machine Learning conference papers scraped from IEEExplore [1]

The Ruby on Rails of ML is toolkits like Tensorflow, Caffe, Theano, and convnetjs. I recently worked with a friend on setting up a TF development environment on an AWS EC2 instance, and the process was a breeze. No need to build your own neural net from scratch anymore!

Recently, Makoto Koike, an embedded systems engineer in Japan, noticed that his parents spent a lot of time sorting and categorizing cucumbers on their farm. The process was just as complicated as growing the vegetable itself. He wanted to automate this process to save his parents from the added manual labor. Although he had limited computer vision background, he used Tensorflow, OpenCV tutorials, and a hardware + camera setup to automatically detect the quality and size of cucumbers grown on the farm - to a relatively high degree of accuracy. Fascinating case study

Obviously for more complex needs, you'll need a deep knowledge of the technology and will need to implement most special cases yourself, but such is the case with web applications as well. Tensorflow still, however, covers a wide variety of general use cases. Even experts in the field will use technology like Tensorflow for prototyping efforts. 

Back to Ilya's original tweet, I think there's an opportunity for a startup that liberates basic development with ML. Parse was a great product because it abstracted away the rough edges of building mobile backends. This precise model can be transferred to AI applications:

A good example of a company doing this is Clarifai. They make a dead simple image/video recognition API - and it works really well. I imagine something like this for a few more use cases - categorizing text, voice recognition, intent creation and fulfillment, etc. It's what Shivon Zillis likes to call "'Alchemists' Promising To Turn Your Data Into Gold". Possibilities are endless. Shoot me a message if you're working on this - I'd love to try it out.



Thanks to Ritwik for looking this over. I can be reached @niraj on twitter or by email.

The Ecosystem Effect

Notice: This post was updated into a longer-form, more complete post here.

Helpful Pre-reading:

  1. Network Effects (a16z) and Data Network Effects (Matt Turck)
  2. Full stack Startups (Chris Dixon), Full Stack Startup Index (Anshu Sharma)
  3. Stack Fallacy: Why Big Companies Keep Failing (Anshu Sharma)
  4. Disruption’s Long, Slow, Complex Journey (Steve Sinofsky)
  5. Disruption is not a strategy (Jerry Neumann)

What is the ecosystem effect?

The ecosystem effect is how to build unstoppable companies. As a company grows out from their initial product offering, they vertically and horizontally integrate (“move out”), to continue growing the company.

It’s how big companies become big.

Companies typically enter the level of the stack at the path of least resistance (easiest go-to market strategy). Build that piece well, monetize, and use money to build out long-term vision and strategy of company from there.

Building out and owning more of the process create a scrabble “double points” effect - effectively, the sum of the parts are greater than the individual parts alone (1 + 1 = 3). Building out creates defensibility in a product. This is where the ecosystem effect comes in - it’s the “lock in effect", because the experience of having all the software on platform is much better than piecing together multiple pieces of software. Creating a network of products has greater value than the products individually.

When venturing out into new markets, companies usually have solid product market for their original product, and likely have enough recurring revenue to fund "moonshots" (not the best term to describe, but it's understable). Moonshot teams get resources to build out new, innovative products from scratch. At Google, many popular products today were built during 20% time, effectively their "moonshots" of the day. Originally it targeted horizontal integration, but now that it's effectively maxed out, it's targeting diagonal and vertical integration (hence all the products [x] is working on).

There are different types of network effects. We’ve seen a few:
  • Social network effects (FB, Snapchat)
  • Data network effects (Uber, Palantir)
(social usually leads to data network effects, but that’s a different conversation)

A third is being proposed:
  • Ecosystem network effects (Google, Uber, Netflix): This is a higher-level version of the full-stack startup (defined by Chris Dixon), if you will. This is how you create data network effects on a platform itself. Original thought leading to conversation: Companies that rebuild an industry, rethink the experience, collect a bunch of data, then use this in product decision to improve their offerings and beat incumbents.
Vertical integration is understable, but doesn’t explain the ecosystem effect fully. This is vertical E2E integration.

TODO: List out assumptions (this works best for software startups, think of hardware ecosystems potentially), then explain why they were right assumptions to make.

Examples of Companies Exhibiting “the Ecosystem Effect”

  • Google
    • Initially built the search engine, on top of previous innovations in computing (mainframe, OS, internet browsing).
    • Built on top of the search engine with AdSense, ad tools to help grow and scale their ad business and revenue
    • Expanded horizontally with Gmail, YouTube, Drive, Music. In the process, built OAuth and single-login to all apps as well as slick integrations between products (ie easily embed Drive docs in Gmail). 
    • Eventually went back and rebuilt layers of stack below them (rebuilt the browser - chrome, rebuilt the OS - ChromeOS, rebuilt the PC - Chromebook, the internet - Fiber.
    • Building below and horizontally are usually moonshot projects, that with traction, become full-blown projects (ie Gmail, Trends, Adsense)
    • Google is going into other industries and verticalizing their new products
    • "No One Ever Got Fired for Buying Google" is the new "No One Ever Got Fired for Buying IBM" (credit to Zach Hamed for the comparison)
  • Netflix
    • Originally started as a DVD rental company through the mail. After seeing the potential of streaming, started investing in scalable cloud technology to efficiently deliver video over the web.
    • Now that some previous content licenses are expiring, Netflix is investing in buying shows, some new, some old, and offering them as original content (one level down in stack).
    • Eventually, Netflix will run their own studio (another level down in stack) and produce a good majority of their content.
  • Uber
    • Most recent example (company is getting to the stage where it’s becoming LARGE)
    • They started off with a simple goal - fastest way to take you from A to B. Expanded product offerings vertically with uberPOOL and uberEATS.
    • Now are integrating up to autonomous vehicles in Uber, the step to eliminating human labor from their system. After Otto acquisition, it seems as if they’re integrating on that horizontally (autonomous uber, but for commercial trucking)
    • As per Semil Shah,
      it can apply those resources [from Uber China - Didi deal] to technologies “up the stack” for a world in which your Ubers are autonomous — that could be pods or cars, sensors, robotics, mapping technologies, deep learning, and a host of other requirements to make a fully-integrated self-driving network a reality. With 80% of each fare you pay going to your driver, the company has a huge incentive to bite into that for its next big meal. [1]
  • Salesforce
    • Why did they buy Quip? Why did they buy Heroku? Seems unrelated from sales software, but they’re building an ecosystem of products.
  • Apple?
    • Apple seems to be creating an ecosystem of brands. With the recent Beats acquisition, Didi investment, and potential McLaren purchase, they want to create a collection of brands with potential collaboration between them. Apple might power the software in the McLaren, that is exclusively on Didi, and has Beats-branded audio in-car. 

Case Study: Dropbox vs. Google Drive

Why are more customers choosing Google Drive over Dropbox? Look at their individual product offerings:
  • Dropbox

  • Google

Browser (Chrome)

OS (ChromeOS)

Computer (Chromebook)

Internet (Fiber)

Notice how seamless Google’s product offerings are. Since they control almost every level of what we interact with, there’s more opportunity to offer niceties and a supreme UX. Take this example: “I used Google+ to schedule a live stream on Google Calendar, presented on YouTube using my Sheets presentation in Chrome. I didn’t once have to leave the Google Ecosystem”. This builds up a large dependency graph from the core product - the more nodes (in this case, products) you add to a graph, the more valuable the network of products become (Metcalfe’s Law). Dropbox may have some horizontal integration, but not the strong vertical (Chrome, ChromeOS, Chromebook, etc.) and horizontal integration (Drive, GMail, Analytics, Calendar, Handouts) Google has.

You could take this from a physics perspective and think of the stack as a hill. As you get to the top of a hill, potential energy increases, and kinetic energy starts to decrease. After you pass the monument of building lower in the stack (at the bottom of the hill), you’ve built up enough potential energy to build horizontally from there, since the previous infrastructure already exists. When Uber created UberEATS, a lot of work was already done, since a network of drivers already existed on the road (rather than having to build from scratch).  

Wiley and I recently recorded a podcast on this phenomena where we explain the idea in more detail + clarity:


Please reach out with any feedback or questions! I can be reached on email or @niraj.

just setting up my posthaven

This was originally published on Tumblr, but Posthaven proved to be a better fit for blogging, so my posts have been moved here.

I’ve been starting to write more about my observations on Medium. But even before writing a piece, it usually starts with an idea in my notebook. My ideas come from anywhere - things I observe on campus, when reading, or in everyday interactions. If I finding myself coming back to a topic and writing (obsessing?) about a specific topic more and more, I decide it’s time to write a blog post. 

The goal of this tumblr is to be the intermediary between my notes and my final posts. Writing has been a great way to solidify my thoughts and challenge my thinking. By posting my notes publicly, I hope to get feedback and really wrangle with these ideas. Not everything will make it to a post, but it’s a good exercise to go through nevertheless.

If you don’t know who I am already, my name is Niraj Pant. I’m a sophomore at UIUC and an external council member at Binary Capital. I like soccer, burritos, and the internet (well everyone does, but not everyone’s willing to admit it 😜). I enjoy conversing on twitter on just about everything. More about my past work here and here.

(+ shoutout to Kanyi Maqubela, whose college advice post was an inspiration for doing this; and Michael Dempsey, whose “notes” blog was an ideal model to follow)