ScaleBugs: Reproducible Scalability Bugs

As part of the ScaleBugs my proposal under the mentorship of Cindy Rubio González and Haryadi S. Gunawi aims to build a dataset of reproducible scalability bugs by analyzing bug reports from popular distributed systems like Cassandra, HDFS, Ignite, and Kafka. The focus is on identifying bugs that are dependent on the scale of the run, such as the number of nodes, file sizes, and request numbers. The resulting dataset will consist of bug artifacts containing the buggy and fixed versions of the scalability system, a reproducible runtime environment, and workload shell scripts designed to demonstrate bug symptoms under different scales. These resource will help support research and development efforts in addressing scalability issues and optimizing system performance.