Hey, I’m Brian from Research at Spec27. I’ve been working on some of the adversarial robustness techniques in the backend and am currently working on the multi-turn extension. I’d be happy to talk about what I’ve learned and hear any suggestions!
Hey! Michal from the engineering team behind here. There are some painful experiences from the journey - async in Django, background processing in Python, scaling agent workflows with growing codebase. Happy to talk!
Hi! Jovanca from Spec27 team here. We started building this because agent safety/validation still feels pretty undercooked in practice. Interested in how people here think about it :D
Hey, I’m Brian from Research at Spec27. I’ve been working on some of the adversarial robustness techniques in the backend and am currently working on the multi-turn extension. I’d be happy to talk about what I’ve learned and hear any suggestions!
I really like the judge from here: https://docs.spec27.ai/docs/guides/judges
I didn't see any example of the full flow, do you have anything that I can see/explore?
Thanks for the feedback and question @eloycoto - there's a Loom video here: https://www.loom.com/share/727528de450a48d29a2ac20b279e26fc, and in the system itself, you can grab an example project from the registry.
There are out of the box judges and then you can customize them for each spec if you are testing something specific.
ohh crazy good! I'll try this weekend and I'll keep you posted. Thanks!
Hey! Michal from the engineering team behind here. There are some painful experiences from the journey - async in Django, background processing in Python, scaling agent workflows with growing codebase. Happy to talk!
Also, Github CLI budgets exploding :-)
Hi! Jovanca from Spec27 team here. We started building this because agent safety/validation still feels pretty undercooked in practice. Interested in how people here think about it :D
I get so mad when responses from chat agents hallucinate. If this can rebuild trust in the results I will give Spec27 a try
Assuming you know when they hallucinate?
[flagged]
[dead]
[flagged]
[dead]