This is Jessica. There have been a few interesting articles in the past couple weeks that point to evaluation blind spots in LLM development. One is this explainer article from OpenAI on why they withdrew their late April update to … Continue reading