Skip to main content

Software Engineering

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

·6 mins
We introduce BeyondSWE, a comprehensive benchmark that broadens existing evaluations along two axes—resolution scope and knowledge scope—using 500 real-world instances across four distinct settings. Together we develop SearchSWE, a framework that integrates deep search with coding abilities to analyse deep research for coding.

Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

·6 mins
We propose Scale-SWE, a sandboxed multi-agent system that constructs 100k real SWE data points — the largest open-source high-quality SWE dataset to date. By training Qwen3-30A3B-Instruct on distilled data, we achieved 64% on SWE-bench-Verified, surpassing industrial models of the same size.