Unlocking that PDF data with pd3f
It’s 2020, and we’re still trying to extract data trapped in shitty PDFs. And weirdly, we’re still using the same tools that started life in 1985. (I see you, Tesseract, you old bastard.) Pd3f isn’t exactly a new set of tools, but it does take the old tools that we love (like Tabula), adds some machine learning, and puts it all into a pipeline to make the art of PDF extraction a little less painful.