Skip to content
Back to blog
AI Technologies

The Real History of AI, Part 2: Backprop, SVM, and the Second Winter (1980–2000)

In 1986 neural networks got a working learning algorithm - and most of the industry didn't notice. While the world watched expert systems collapse, OCR was already reading your mail at the post office, and SVMs were quietly winning every benchmark. The story of 'hidden AI' between the two winters.

Mikhail SavchenkoApril 29, 20267 min read
AIHistoryNeural NetworksSVMOCR

Between 1980 and 2000 AI lived through its second winter and picked up two foundational tools: backpropagation (1986) made deep neural networks trainable, and Support Vector Machines (1995) became the industry-standard ML algorithm for the next decade. By the late 1990s, neural-net OCR was already reading roughly half of US mail, and Google had launched a search engine on top of PageRank - but nobody called any of it AI.

Key facts

  • 1986: Rumelhart, Hinton, and Williams published backpropagation in Nature - the algorithm existed earlier, but this paper made it the standard.
  • 1989: Yann LeCun applied his LeNet convolutional network to handwritten digit recognition for the US Postal Service; by the mid-1990s the system read roughly 10% of US ZIP codes.
  • 1995: Corinna Cortes and Vladimir Vapnik published Support Vector Machines (SVM) - the dominant industrial ML algorithm for the next decade.
  • 1997: IBM's Deep Blue beat Garry Kasparov - but it was not a neural network. Brute force tree search at 200 million positions per second.
  • 1998: Google launched on PageRank - linear algebra over the web graph, but its founders did not call it 'artificial intelligence'.

Between Two Winters

By the early 1980s the first AI winter began to thaw. Not because of neural networks - they were still in disgrace after Minsky and Papert - but because of expert systems, which suddenly started making money. By mid-decade the commercial expert-systems market exceeded a billion dollars a year. AI was a respectable field again.

Then that industry collapsed a second time. And on its wreckage grew exactly what we now call modern AI - quietly, under different names, in post offices, banks, and search-engine server farms. This is part two of the series: the 1980s and 1990s, the era of "hidden AI."

1986: The Algorithm Everyone Had Been Waiting For

In October 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a short paper in Nature titled "Learning representations by back-propagating errors." It described a practical algorithm for training multilayer neural networks.

The idea itself was not new - its ingredients appeared in Bryson and Ho in 1969, Paul Werbos's 1974 dissertation, and David Parker's 1985 work. But the Rumelhart-Hinton-Williams paper made backpropagation the universal standard. From that moment on, neural networks had a working method to train any number of hidden layers.

Mechanically, backprop computes the gradient of the loss function with respect to each weight via the chain rule. Once you know the gradient, you know which way to push each weight to lower the error. Every modern neural network - GPT-5 included - is trained by some variant of this algorithm.

In 1986 journalists were writing about expert systems and Lisp machines, and the Nature paper went almost unnoticed. Neural networks remained the domain of a small research community.

1989: Yann LeCun and the US Mail

In 1989 the French scientist Yann LeCun, at Bell Labs, applied a convolutional neural network called LeNet to a practical task: handwritten digit recognition in ZIP codes for the US Postal Service.

By the mid-1990s LeCun's system was reading roughly 10% of all US mail. Slightly later, similar networks started reading dollar amounts on bank checks at major US banks. By the early 2000s CNN-based OCR processed most of the country's handwritten checks.

None of those mail recipients thought "artificial intelligence." The technology simply worked: your scrawled "12345" turned into a postal route. One of the first mass cases of a neural network becoming infrastructure - an invisible layer underneath everyday life.

The Late 1980s: The Second Winter

While neural networks were quietly conquering the postal system, the expert-systems market was collapsing. Several reasons:

  • Lisp machines - expensive specialized hardware for running expert systems - lost to cheaper Sun workstations and IBM PCs.
  • Maintaining rule bases turned out to be brutally expensive. Every domain change meant rewriting hundreds of rules.
  • Brittleness: an expert system with 5,000 rules worked beautifully on planned cases and fell apart on a single new one.

By 1990 the major expert-system vendors (Symbolics, Lisp Machines Inc, Intellicorp) had either gone bankrupt or retreated into niche markets. DARPA again cut funding for "AI research." This was the second AI winter, lasting roughly 1987 to 1993.

During this winter, machine-learning researchers carefully avoided the word "AI." Grant proposals went out under titles like "pattern recognition," "statistical learning," "data mining." The word had become toxic.

1995: The Quiet SVM Revolution

In 1995 Corinna Cortes and Vladimir Vapnik published the Support Vector Machine paper. The SVM idea is simple: in feature space, find the hyperplane that separates two classes with the maximum margin. If the classes are not linearly separable, apply a kernel (the kernel trick) and solve the problem in an implicitly higher-dimensional space.

Compared to neural networks, SVM had:

  • A solid mathematical theory (Vapnik's structural risk minimization).
  • A guaranteed convergence to the global optimum (neural nets only hit local optima).
  • Strong performance on small datasets.
  • No need for fragile architecture and activation tuning.

From roughly 1995 to 2010, SVM was the industrial ML standard. Topic classification of text, face recognition before AlexNet, bioinformatics, credit scoring - all of it ran on SVMs. When you saw "AI" in a 2005 product, behind it was almost certainly an SVM with an RBF kernel trained on a few thousand examples.

1997: Deep Blue - Not a Neural Net, Doesn't Learn, Beats the World Champion

On May 11, 1997, IBM's Deep Blue beat Garry Kasparov in a six-game match. Headlines worldwide: "AI defeats human in chess."

What was inside Deep Blue? Not a neural network. Not machine learning. A specialized chess supercomputer: 30 IBM RS/6000 nodes plus 480 custom VLSI chips for position evaluation. Architecturally it ran minimax with alpha-beta pruning (the same algorithm as the ZX Spectrum chess from Part 1) but with an evaluation function written by grandmasters and the ability to crunch 200 million positions per second.

Deep Blue learned nothing. All of its chess knowledge was hand-coded by humans. A triumph of symbolic AI plus brute force, not machine learning. But the public missed the distinction, and Deep Blue's victory became one of the most powerful PR moments in AI history.

1998: Google and Hidden AI

In September 1998, Larry Page and Sergey Brin registered Google. The core was PageRank: an iterative computation of a page's "importance" as a weighted sum of the importance of pages that link to it. Mathematically, finding the principal eigenvector of a giant sparse matrix of the web graph.

In its early years Google never positioned itself as an AI company. But by 2004, Gmail ran a Bayesian spam filter; by 2008, ranking algorithms involved dozens of signals trained on click data; by 2011, Google was buying computer-vision startups. The word "AI" in the early 2000s was simply awkward - after the second winter, investors and journalists shied away from it.

This is a critical thought for the whole history: by 2000, AI was already running inside roughly every product you used daily. Nobody just called it AI.

A Personal Anecdote: Dragon NaturallySpeaking 1997

In the late 1990s I was starting out as a writer for a regional newspaper. My editor suggested installing Dragon NaturallySpeaking - speech recognition software for dictating articles instead of typing them. The 1997 version cost around $700 and required about an hour of voice training.

Workflow: you dictated into a microphone, the program output text. It got things wrong, often comically - my name reliably became something like "mexican sugar black." But on clean speech and a calm topic it could hit roughly 100 words per minute at 90% accuracy, about three times my typing speed.

What was inside? Hidden Markov Models for the acoustic part and n-gram language models for word sequences. No neural networks, no deep learning. Pure statistics and probability theory developed in the 1960s-80s. Speech recognition was actively used for journalism by 1997, and nobody called it AI. It was filed under "office software."

When someone today says OpenAI's Whisper (2022) is "the first practical speech recognition program," I think of Dragon. It worked in my office twenty-five years before Whisper. Just slower, more carefully tuned to a single voice, and useless with accents. The line of descent: Dragon 1997 → Google Voice 2008 → Siri 2011 → Alexa 2014 → Whisper 2022. Same class of technology, four generations of improvement.

What to Take From This Era

The main claims of Part 2:

  1. Backpropagation has existed since 1986. Everything that happened in deep learning in the 2010s was scaling a 1986 algorithm onto hardware that didn't exist then. Not a new principle - new transistors.
  2. From the late 1980s through the early 2010s, the industry-standard ML was SVM, not neural nets. That's an important correction to the "deep learning is machine learning" narrative. Most production models before 2012 ran without a single neuron.
  3. By 2000, AI was already living inside mail systems, banks, search engines, and spam filters - but, scarred by the second winter, nobody called it "AI." When ChatGPT brought the term back into fashion in 2022, what came back was the fashion, not the technology.
  4. AI winters interrupt funding, not development. In both winters (1974-1980 and 1987-1993), the key algorithms kept appearing: backprop, CNN, SVM, PageRank, HMM. Just without the headlines.

In Part 3: the 2000s and early 2010s - the era of recommender systems (Netflix Prize, collaborative filtering), face detection in every camera (Viola-Jones), the first Tesla Autopilot generation, and finally ImageNet 2009, the dataset that would unleash the deep-learning explosion three years later.

Frequently Asked Questions

What is backpropagation and why does it matter?

Backpropagation is the algorithm that lets you train multilayer neural networks by efficiently computing how to adjust each weight to reduce error. The idea was in the air since the 1960s (Bryson and Ho, then Werbos in 1974), but the 1986 Rumelhart-Hinton-Williams paper in Nature made it operational. Without backprop there is no AlexNet in 2012 and no modern transformer.

Why did the second AI winter happen?

The expert-systems market collapsed in the late 1980s. Lisp machines were expensive, narrow rule bases were costly to maintain, and each new edge case meant rewriting hundreds of rules. When cheap workstations and standard software arrived in the early 1990s, the specialized AI vendors went bankrupt. The second winter ran roughly 1987-1993 and once again cut government funding.

If backpropagation worked from 1986, why did AlexNet only arrive in 2012?

Two pieces were missing: compute (GPUs with CUDA only became viable around 2007-2010) and large labeled datasets (ImageNet was published in 2009). LeCun trained LeNet for weeks on CPUs; AlexNet ran on two consumer GPUs in 2012. The algorithm was the same. The hardware changed.

What is a Support Vector Machine and why did it dominate the 1990s and 2000s?

An SVM finds the maximum-margin boundary between classes in a high-dimensional space. It gave stable results on small datasets, had a solid mathematical foundation, and didn't require the careful hyperparameter tuning of neural networks. From roughly 1995 to 2010, SVM was the industrial ML standard - text classification, face recognition before AlexNet, bioinformatics, credit scoring all leaned on it.

Was Google in 1998 an AI company?

Legally, no - the founders positioned Google as a search company. In substance, yes. PageRank is an iterative algorithm on adjacency matrices - the kind of thing 2025 calls 'graph ML'. Gmail's spam filter (2004) was a Bayesian classifier. The word 'AI' was just toxic after the second winter, and serious companies avoided it.

Keep reading

The Real History of AI, Part 2: Backprop, SVM, and the Second Winter (1980–2000) | INITE AI Blog