All the data you can eat

Unlocking that PDF data with pd3f

Jason Norwood-Young
Jason Norwood-Young

It’s 2020, and we’re still trying to extract data trapped in shitty PDFs. And weirdly, we’re still using the same tools that started life in 1985. (I see you, Tesseract, you old bastard.) Pd3f isn’t exactly a new set of tools, but it does take the old tools that we love (like Tabula), adds some machine learning, and puts it all into a pipeline to make the art of PDF extraction a little less painful.

Pd3f Extractor Flow Diagram
The pd3f data flow
Jason Norwood-Young
  • Journalist, developer, community builder, newsletter creator and international man of mystery.

Leave a Comment

Your email address will not be published. Required fields are marked *