Published inAI AdvancesCutting-edge LLM Evals with humans, AI judges, and GPT token probabilitiesIn the race to deploy AI, one of the trickiest components is LLM Evaluation (Evals)Jul 7, 2024Jul 7, 2024
Published inCubedClaude Sonnet 3.5 vs GPT-4oComparing SOTA LLM performance on a realistic few-shot categorization taskJul 1, 2024Jul 1, 2024