Learning Performance Improving Code Edits with pie

1Carnegie Mellon University 2University of Pennsylvania 3Google
4Inspired Cognition 5Google Research, Brain Team
TLDR: A dataset and models for learning code edits that can improve runtime. Browse through some optimizations performed by codex/codegen using PIE below. The examples are selectively picked for brevity.
Slow Code
Fast Code
Caption 1

About pie

Q. About PIE

Q. What are we releasing?

Q. But ChatGPT/CODEX can already do everything?

Q. What makes PIE unique?

Q. Why are trajectories important?

Q. Why is it essential to have minimal changes?

Q. Who cares if my python script runs in 0.5 seconds instead of 0.4 seconds?

Q. What if the optimizations add bugs?

Q. How large is the dataset?

Q. How can I help?


The waning of Moore's Law has shifted the focus of the tech industry towards alternative methods to achieve continued performance gains. To optimize program efficiency, programmers must craft and refactor code with better performance characteristics. In this paper, we investigate the ability of large language models (LLMs) to suggest functionally correct, performance-improving code edits. We hypothesize that language models can suggest such edits in ways that would be impractical for static analysis alone. To evaluate and improve the capacity of large language models, we have curated a large-scale dataset of Performance-Improving Edits (PIE). PIE contains trajectories of programs, where a programmer begins with an initial, slower version and iteratively makes changes to improve the program's performance. We use PIE to fine-tune multiple variants of CODEGEN, a billion-scale Transformer-decoder model. Additionally, we use examples from PIE to prompt OpenAI's CODEX using a few-shot prompting. Our results show that both CODEX and CODEGEN can generate performance-improving edits, with speedups of more than 2.5x for over 25% of the programs. Moreover, PIE allows CODEGEN, an open-sourced and 10x smaller model than CODEX, to match the performance of CODEX on this challenging task.


Slow-Fast Pairs

Dataset of slow-fast pairs (tsv) is located here.

Columns description:
  • user_id: user id
  • problem_id: problem id. Details about the problems can be found in data/problem_list.csv
  • language: programming language
  • submission_id_v0: submission id of the first version of the code
  • submission_id_v1: submission id of the improved version of the code
  • cpu_time_v0: cpu time of the first version of the code
  • cpu_time_v1: cpu time of the second version of the code. cpu_time_v0 > cpu_time_v1 by at least 1% for all the pairs in the dataset. For pairs where the first version was TLE, cpu_time_v0 is set to some high value (e.g. 1000000).
  • memory_v{0,1}: memory used by the code in the two versions. We can also use memory_v0 > memory_v1 to filter out pairs.
  • status_v{0,1}: status of the code in the two versions. status_v0 can be "Accepted" or "Time Limit Exceeded", but status_v1 is always "Accepted".
  • improvement_frac: percentage of improvement of the second version of the code with respect to the first version. improvement_frac is always > 0.

Train/Test/Val Splits for Python and C++

We also provide the splits used in the paper. The splits are in jsonl format. The train/test/val files contain additional information like the diff. The slower program is the input and the faster program is the target.

Main Results


    title={Learning Performance-Improving Code Edits},
    author={Madaan, Aman and Shypula, Alexander and Alon, Uri and Hashemi, Milad and Ranganathan, Parthasarathy and Yang, Yiming and Neubig, Graham and Yazdanbakhsh, Amir},
    journal={arXiv preprint arXiv:2302.07867},