My friend asked if there was a service that could read academic papers aloud - not like NotebookLM which creates podcast-style summaries, but something that would actually read the original text. She wanted to listen to papers like audiobooks when her eyes got tired.
I didn’t know of such a service, but since I’m familiar with Microsoft Azure Language Services, I offered to help: “Send me the paper and I’ll make mp3 for you.”
I thought this would be simple:
Wrong. Academic PDFs are messy. Extract text and you get dozens of co-author names, chart numbers, table data, footnotes - everything my friend didn’t want to hear.
I tried asking different AIs to extract only title, abstract, and main content:
Turns out extracting clean content from academic PDFs is harder than expected.
I generated the MP3 using Azure TTS and made a tutorial video. But a week later, my friend was still hunting for paid PDF-to-speech services.
That’s when I realized: I might not know the perfect PDF-to-text AI, but I can definitely build a basic TTS web app.
So I coded one with Claude Sonnet, built it in Next.js, deployed on Vercel, and shared it.
This is why I love being able to code!!!
Previous: I Created a DApp: Building a Kapibara Token Faucet