Facebook’s computer vision breakthrough has gone largely unnoticed: that’s not only ironic but remarkable. Here’s why.
Facebook’s AI team this month announced a breakthrough in computer vision which has gone largely unnoticed. Not only ironic but also remarkable given that it marks a significant change in the economics — and consequently the dynamics — of new entry into the world of AI.
Up until now most machine learning models in production use a technique called supervised learning which needs labelled training data, and typically lots of it. If you want to predict which incoming customer service emails correspond to which needs so that you can route them automatically to the right team, you need training data comprising a large body of historical emails each labelled with what the actual customer need was. If you want to predict the car repair cost for claims processing based on images of the damage, you need historical images labelled with the actual repair costs that resulted. As you can imagine, this can make access to training data expensive. It also makes access to training data a critical requirement (often the critical requirement) for startup success — many an AI startup has foundered on the rocks of being unable to access proprietary, labelled datasets.
Self-supervised learning, on the other hand, is a technique that doesn’t require data labelling. Over the last six years, self-supervised learning has taken the world of language AI by storm. GPT-3, OpenAI’s gargantuan language model that made media waves last year with its uncanny ability to generate human quality text, used self-supervised learning. The model was trained using c.500bn tokens (a word or punctuation mark) of text to predict the next word that follows given the preceding words. Not having to label the data means that the model can use limitless amounts of training data. The bottlenecks instead become engineering a stable model behemoth along with the compute cost to train. GPT-3 had 175bn tunable parameters and cost millions of dollars to train. In self-supervision the sky’s the limit. Ever bigger models appear to be giving ever better results. Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA “thinks it is entirely possible that in five years a company could invest one billion dollars in compute time to train a single language model”. ARK Investment Management’s analysis suggests the same.
GPT-3 made its commercial API available in October and there are now a plethora of applications that sit on top of it. For any given use case you fine-tune GPT-3 on a small labelled training dataset corresponding to your specific requirements — using a lot less labelled data than if you were training a model from scratch. The figure below shows some of the applications using GPT-3.
Facebook’s AI team have just demonstrated — six years after it became the norm in language models — that the same principle can also be made to work in vision. They trained their model on 1bn unlabelled Instagram images and demonstrated that fine-tuning it significantly outperforms the standard approach of training a model from scratch on a smaller dataset.
Self-supervision is going to rule the world. It won’t be long before a wide range of AI domains (language, vision, speech recognition, world understanding, etc.) are all founded on gargantuan, high cost self-supervised models. It won’t be long after that before these separate domains start to unify, with the corresponding models becoming even larger and even more inaccessible except to the very (very) largest organisations.
Here’s three implications:
- No-one can be an AI startup. The days of being able to de novo build end-to-end new AI language businesses are nearly over. The cost of compute plus the cost of talent are becoming prohibitive. As self-supervision gains ground, this will become true for more and more AI domains. The just released 2021 Stanford University AI Index Report shows that global private investment in AI continued to grow in 2020, but that the number of newly funded AI companies has dropped off a cliff. Self-supervision will be the final nail in the coffin. Think long and hard before starting up an AI business based on a sense that your core AI technique is better than anyone else’s.
- Everyone can be an AI startup. The flip side is that the low cost of entry to the application layer that will sit on top of these cloud-based “platform” models (since that is essentially what they’ll become) means that everyone can be an AI start-up. In time we won’t think of AI start-ups as a discrete class. With modest technical abilities, any start-up will regard pre-trained AI utilities in language, vision, speech, etc. as a natural part of the toolkit that they have at their disposal to solve the particular customer problem they’re aimed at.
- Think hard about defensibility. Bradford Cross coined the term “vertical AI startups”: AI start-ups that build competitive defensibility by combining 1. Full-stack products + 2. Domain expertise + 3. Proprietary data. In this new model platform world, full-stack for many is going to be neither necessary nor feasible. Defensibility is going to have to come from domain expertise plus access to proprietary data — and bear in mind the bar in terms of data quantity is going to be lower than before. A start-up providing a next generation auto-complete using a third party language model to turn generic speech instructions into fully written emails is going to be more easily replicated than a social care start-up reading in social worker spoken case notes and images to summarise and prioritise them. The latter requires (and generates over time) considerably more domain expertise, operating in a high-consequence decision-making environment where small expertise-driven improvements will have huge impact.
So get creative. Educate yourself on what AI can presently do in a range of domains and imagine how you would use it to enhance your business’s value proposition. Within a short space of time you’re going to be able to.
Sign-up to the newsletter at https://www.theaigroupie.com for more big picture views on AI.