EMNLP2025 Yu: Long-Context LM Fail in Basic RetrievalSynthetic dataset finds that needle-in-the-haystack problems fail when needle needs reasoning