GitHub Copilot SDK by Bart Wullems: This isn’t just a simple agent SDK: It’s the GitHub Copilot Harness SDK. You can use it to build your own harness on top. Why would you need that? Some idea's:
- Implement the RALF loop to get results similar to the new /goal feature in OpenAI Codex.
- Every time the harness asks, “Do you want to run this command?”, you could plug in another LLM to give a safety check – like “seems fine”, “seems dangerous” or “not sure”.
Introduction to fine-tuning and evaluations in Microsoft Foundry by Vincent Biret: It was nice to see how to use Microsoft Foundry to fine-tune models and evaluate models / agents, not just from the Azure portal but directly from code using the Azure SDK. That’s what most people would use in a CI/CD environment.
Guardrails and Evals for Production LLM Systems by Hampton Paulk: Different techniques to evaluate LLM outputs, mostly judging and checking response coherence across multiple runs.
Spec-driven development live by Sakari Nahi: A live coding session based on spec-driven development. Fun to watch, mildly interesting.
I also learned about WebMCP, but I don't remember in which session.
The closing talk was “How 5000 years of history is shaping the AI era” by Tom Van De Weghe: A very interesting take on what’s happening in China right now, with the historical reasons that explain why and how they’ve become so advanced in most tech fields: EVs, solar panels, AI, robotics, and more.
This year was a good edition 👍