Q. About PIE
Q. What are we releasing?
Q. But ChatGPT/CODEX can already do everything?
Q. What makes PIE unique?
Q. Why are trajectories important?
Q. Why is it essential to have minimal changes?
Q. Who cares if my python script runs in 0.5 seconds instead of 0.4 seconds?
Q. What if the optimizations add bugs?
Q. How large is the dataset?
Dataset of slow-fast pairs (tsv
) is located here.
data/problem_list.csv
cpu_time_v0
> cpu_time_v1
by at least 1% for all the pairs in
the dataset. For pairs where the first version was TLE, cpu_time_v0
is set to
some high value (e.g. 1000000).
memory_v0
> memory_v1
to filter out pairs.
status_v0
can be "Accepted" or "Time Limit Exceeded", but
status_v1
is always "Accepted".
improvement_frac
is always > 0.jsonl
format.
The train/test/val files contain additional information like the diff
. The slower
program is the input
and the faster program is the target
.
@article{madaan2023learning,
title={Learning Performance-Improving Code Edits},
author={Madaan, Aman and Shypula, Alexander and Alon, Uri and Hashemi, Milad and Ranganathan, Parthasarathy and Yang, Yiming and Neubig, Graham and Yazdanbakhsh, Amir},
journal={arXiv preprint arXiv:2302.07867},
year={2023}
}