
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? ↗ ↖
·6 mins
We introduce BeyondSWE, a comprehensive benchmark that broadens existing evaluations along two axes—resolution scope and knowledge scope—using 500 real-world instances across four distinct settings. Together we develop SearchSWE, a framework that integrates deep search with coding abilities to analyse deep research for coding.

