Description
In this informal mini-episode, Josh Stepp delves into two AI-related topics. First, he explores the "Vending Bench" research paper, which tests the long-term coherence of LLM-based agents running a vending machine business, revealing high variance in performance, with top models like Claude 3.5 Sonnet and OpenAI's O3 Mini outperforming humans but occasionally spiraling into chaotic behaviors like spamming the FBI over minor issues. Then, Josh reacts to a Pentest Partners blog post about exploiting SharePoint via Microsoft's CoPilot, highlighting how attackers can bypass access controls and forensic tracking to mine sensitive data
Call to Action:
Subscribe to the podcast for more episodes on high-profile cyber intrusions.
Visit our website at intrusionsindepth.com for additional stories and insights.
Share your thoughts on social media using #IntrusionsInDepth.
Links and Resources:
https://arxiv.org/pdf/2502.15840
https://andonlabs.com/
https://www.pentestpartners.com/security-blog/exploiting-copilot-ai-for-sharepoint/
Host: Josh Stepp
Produced by: Josh Stepp
Thank you for tuning in to IntrusionsinDepth. Stay informed, stay safe, and see you in the next episode!
Share this post