Coming Soon! We are currently in private beta mode — hang in there, we’ll be up in no time.

The SNEWPapers Difference

The Problem

If you’re like us, at some point you’ve looked at the massive collections of old newspapers and wondered:

Typical newspaper archive search interface

“What kind of hidden gold is sitting in all these pages?”

You know the feeling. You land on one of the big newspaper archive sites, push past the flashy ads and “millions of pages!” promises, and finally peek behind the paywall.

What greets you? A clunky search box asking for keywords, names, or dates… and then:

523,214 hits.

Suddenly that exciting treasure hunt turns into an overwhelming haystack. You have no idea what’s actually relevant, what’s worth reading, or whether the real gems are even there. Instead of discovering stories, you’re left drowning in results, wasting hours (or days) trying to find the signal in all the noise.

We did some sleuthing to figure out why...

It mostly boils down to technology that every major archive still relies on: antiquated Optical Character Recognition (OCR). Here’s a real example from Benjamin Franklin’s Pennsylvania Gazette, May 9, 1751. On the left: the messy text extracted by traditional OCR. On the right: the original document.

View on the Library of Congress →

Side-by-side comparison of OCR text versus original newspaper scan

This is exactly why traditional archives force you into rigid keyword, name, or date searches. It's the only arrow in their quiver. They only have imperfect, error-filled text to work with... sometimes accurate, often complete gobbledygook. So when you search, the system just hunts for something “close enough” and dumps hundreds of thousands of results on you, leaving you to do all the hard work of sorting through the noise.

Could we do better?

Yes. Absolutely.

For most archives, the holy grail would be perfect OCR -> better keywords -> better keyword search. But this still leaves us, the users, with the same old problems.

We took the road less traveled.

At SNEWPapers, high quality OCR isn’t the goal, it's the starting line. Wouldn’t it be nice to search for a concept, rather than a word, and get back articles instead of endless thousands of newspaper pages for you to read? Wouldn’t it be nice to have a chat with an AI Research Assitant to help you navigate the labyrinth? For us, the real prize is understanding what’s actually written in these papers. So instead of just teaching machines to recognize letters, we taught them how to truly read: to grasp context, meaning, historical nuance, and the stories hidden between the lines.

Snewpapers processing pipeline

The Secret Sauce

By taking a multi-modal approach, combining advanced image processing techniques, finely-tuned AI models and a dash or two of traditional organic neural networks 🧠, we don’t just read the text. We truly understand it.

The Result

Something a little more readable:

To the Printers of the Gazette.

In a Passage in one of your late Papers, I understand that the Government at home will not suffer our mistaken Assemblies to make any Law for preventing or discouraging the Importation of Convicts from Great Britain, for this kind Reason, That such Laws are against the Publick Utility, as they tend to prevent the IMPROVEMENT and WELL PEOPLING of the Colonies.

Such a tender parental Concern in our Mother Country for the Welfare of her Children, calls aloud for the highest Returns of Gratitude and Duty.…

In some of the uninhabited Parts of these Provinces, there are Numbers of these venomous Reptiles we call RATTLE-SNAKES; Felons-convict from the Beginning of the World: These, whenever we meet with them, we put to Death, by Virtue of an old Law, Thou shalt bruise his Head. But as this is a sanguinary Law, and may seem too cruel; and as however mischievous those Creatures are with us, they may possibly change their Natures, if they were to change the Climate; I would humbly propose, that this general Sentence of Death be changed for Transportation.

In the Spring of the Year, when they first creep out of their Holes, they are feeble, heavy, slow, and easily taken; and if a small Bounty were allowed per Head, some Thousands might be collected annually, and transported to Britain. There I would propose to have them carefully distributed in St. James’s Park, in the Spring-Gardens and other Places of Pleasure about London; in the Gardens of all the Nobility and Gentry throughout the Nation; but particularly in the Gardens of the Prime Ministers, the Lords of Trade and Members of Parliament; for to them we are most particularly obliged.

…What then? Is not Example more prevalent than Precept? And may not the honest rough British Gentry, by a Familiarity with these Reptiles, learn to creep, and to insinuate, and to flatter, and to wriggle into Place (and perhaps to poision such as stand in their Way) Qualities of no small Advantage to Courtiers!

I would only add, That this Exporting of Felons to the Colonies, may be consider’d as a Trade, as well as in the Light of a Favour. Now all Commerce implies Returns: Justice requires them: There can be no Trade without them. And Rattle-Snakes seem the most suitable Return; for the Human Serpents sent us by our Mother Country.… For the Rattle-Snake gives Warning before he attempts his Mischief; which the Convict does not.

I am Yours, &c.
AMERICANUS.

Pennsylvania Gazette, May 9, 1751 • signed AMERICANUS • View this article →

Do we get it right every time?

Absolutely not! But we’re getting better every day 💪 and we think you’ll be pleasantly surprised by how often we do.

Clear, easy-to-read newspaper article

Sometimes it’s easy, like reading a book

View this article →
Newspaper article requiring closer inspection

Sometimes you’ve got to tilt your head a little

View this article →
Dense, hard-to-read newspaper page

Sometimes it gets a bit messy when you zoom out!

Newspaper article where the ending is far from the beginning

Sometimes, the end of the story is far far away from the beginning

View this article →
Extremely difficult newspaper scan

Sometimes… the juice ain’t worth the squeeze

View original PDFLOC edition

Six million stories are waiting for you

...and many more pouring in every day. We’ve separated the ads from the content, and categorized it all into a multi-layered taxonomy for you to sift, sort and peruse at your pleasure.

Rather have a conversation than dig on your own? Our AI Sleuth research assistant has your back.