A job search engine where AI extracts 100+ attributes from every listing

AI-powered job data infrastructure. Scrapes ATS systems and enriches every listing with 100+ structured attributes.

Jun 1, 2024

Every job board I'd ever used had the same fundamental problem: they're built for employers, not for the people actually searching. The listings are walls of unstructured text. The filters are basic — location, maybe salary range if you're lucky. Results are padded with sponsored posts. And you're left doing the real work yourself: reading through dozens of descriptions trying to figure out if a role offers equity, requires a specific tool, allows remote work, or matches any of the hundred other things you actually care about.

I wanted to build something that flipped that entirely. JobRadar.ai scrapes job listings directly from applicant tracking systems — Greenhouse, Ashby, Workable, Lever, Recruitee, Paylocity — then runs every single listing through an AI enrichment pipeline that extracts over a hundred structured attributes. Compensation details, required skills, benefits, workplace type, education requirements, languages, tools and technologies — all pulled out of unstructured descriptions and turned into filterable fields. The stuff you'd normally have to read five paragraphs to find out becomes a checkbox.

The scraping architecture was one of the more satisfying parts to design. Each job source — every ATS platform — is its own plugin with a standardized interface. Adding a new source means writing one class. The system handles the rest: scheduling scrapes, normalizing everything into a common format, and running deduplication. That deduplication piece is trickier than it sounds, because the same job gets posted across multiple platforms with slightly different titles and formatting. I built a hybrid approach — exact-match checks for the obvious duplicates, then vector embeddings and AI-powered semantic comparison for the ones that look different but describe the same role.

The enrichment pipeline is where it gets really interesting. Each listing passes through an LLM that extracts structured data from the description — not just the obvious fields, but things like whether the role involves on-call rotations, what the interview process looks like, which specific frameworks they use, whether equity compensation is buried in the third paragraph. I designed a declarative hydration schema where a single configuration defines the attribute, generates the database column, teaches the AI what to extract, and adds the filter to the search interface. One change propagates everywhere. Adding a new searchable dimension to the entire platform is a five-minute edit.

For search, I used PostGIS for real geospatial filtering — actual radius queries that respect geography, not just text matching on city names — and pgvector for semantic similarity alongside the structured filters. The goal was a search experience where you could drill down across dozens of dimensions simultaneously and still get results that actually match. No noise, no sponsored listings, no sign-up wall.

The bigger picture was always infrastructure. JobRadar was designed to be the data layer underneath an ecosystem. It could power automated job application tools, offer B2B API access for companies analyzing the job market at scale, or just stand on its own as the most granular job search engine available. The job data problem is genuinely hard — normalizing listings from six different ATS platforms into a single queryable format, keeping everything fresh, handling rate limits, expiring dead postings — and I realized early on that solving it well was its own product.

It's on the shelf right now. The scrapers were built, the enrichment pipeline was working, but the database hadn't hit the critical mass I wanted, and running the infrastructure was expensive for something I wasn't actively promoting yet. When another project pulled me in a different direction, I paused it rather than run it at half capacity. The site still loads, but the backend is off.

I still think about this one. There's something satisfying about building a system that genuinely serves the person using it — no ads, no paywalls, no employer incentives warping the results. The architecture is solid, the domain is great, and the problem isn't going anywhere. It's the kind of project that's waiting for the right moment to come back to life.