In short, always :).
QualEval is general and is applicable to any task and any language model.
We demonstrate QualEval on a wide variety of generative and classification tasks, including code generation, question answering, and dialogue.
Importantly, we demonstrate how insights from QualEval can be used to improve model performance.
We list some example dashboards for different models and different tasks. QualEval generates high-quality attributes and faithfully presents interpretable and actionable insights.