just stumbled across bonnie xu's breakdown of how they built
kepler to handle their internal data needs. it is wild that they are running an agentic workflow against over 600 petabytes of data. instead of just dumping everything into a prompt, they use
mcp and automated code crawling to bypass those annoying context window constraints. the way they implement scoped semantic memory for self-learning is pretty clever for maintaining accuracy. they also use
ast-based grading to keep their evaluation pipeline from regressing during updates. it makes me wonder if we are moving toward a world where manual sql writing is
completely mostly obsolete for standard queries. does anyone here think the reliance on RAG and code crawling will eventually break down once datasets scale even further? i am curious how this compares to using
bigquery or other warehouse-native ai tools. it seems like the real challenge isnt the data size, but the
evaluation framework needed to trust the agent's output.
article:
https://www.infoq.com/presentations/data-aware-ai-agents/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global