Testing LLMs and Evaluating Outputs with Prompt Forge Tooling

·Jul 09, 2025 07:52 AM·

Prompt / agent tooling for testing various LLMs and evals Hey long time I hadn’t posted here 😛 I’d love to be able to: - Iterate and validate the output of my assistants a bit more systematically, a bit like a QA eng - Compare various models I came across this demo of an open source project that does exactly that!

👀1

➕3

3 comments

· Sorted by Oldest

Hugo Hahn
·
·
Plus one on this.
Hugo Hahn
·
·
As I was building an agent, iterating on the prompt, I would have loved to have 3-4 basic scenario where I expected a specific answer. It would allow me to iterate more peacefully, knowing I would not go back to a previous state.
Remi
·
·
Thank you both ! Thê-Minh TRINH always happy to have you hang out with us over here ✌️ It's a great feedback. The tricky thing is to add this without adding too much complexity to the user experience. I'll definitely share your feedback with the team!
👍1

Hugo Hahn
·
·
Plus one on this.
Hugo Hahn
·
·
As I was building an agent, iterating on the prompt, I would have loved to have 3-4 basic scenario where I expected a specific answer. It would allow me to iterate more peacefully, knowing I would not go back to a previous state.
Remi
·
·
Thank you both ! Thê-Minh TRINH always happy to have you hang out with us over here ✌️ It's a great feedback. The tricky thing is to add this without adding too much complexity to the user experience. I'll definitely share your feedback with the team!
👍1