TY - GEN
T1 - Verifying equivalence of spark programs
AU - Grossman, Shelly
AU - Cohen, Sara
AU - Itzhaky, Shachar
AU - Rinetzky, Noam
AU - Sagiv, Mooly
N1 - Publisher Copyright: © Springer International Publishing AG 2017
PY - 2017
Y1 - 2017
N2 - Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDF s). In this paper, we present a novel SMT-based technique for verifying the equivalence of Spark programs. We model Spark as a programming language whose semantics imitates Relational Algebra queries (with aggregations) over bags (multisets) and allows for UDFs expressible in Presburger Arithmetics. We prove that the problem of checking equivalence is undecidable even for programs which use a single aggregation operator. Thus, we present sound techniques for verifying the equivalence of interesting classes of Spark programs, and show that it is complete under certain restrictions. We implemented our technique, and applied it to a few small, but intricate, test cases.
AB - Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDF s). In this paper, we present a novel SMT-based technique for verifying the equivalence of Spark programs. We model Spark as a programming language whose semantics imitates Relational Algebra queries (with aggregations) over bags (multisets) and allows for UDFs expressible in Presburger Arithmetics. We prove that the problem of checking equivalence is undecidable even for programs which use a single aggregation operator. Thus, we present sound techniques for verifying the equivalence of interesting classes of Spark programs, and show that it is complete under certain restrictions. We implemented our technique, and applied it to a few small, but intricate, test cases.
UR - http://www.scopus.com/inward/record.url?scp=85026736663&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-63390-9_15
DO - 10.1007/978-3-319-63390-9_15
M3 - منشور من مؤتمر
SN - 9783319633893
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 282
EP - 300
BT - Computer Aided Verification - 29th International Conference, CAV 2017, Proceedings
A2 - Kuncak, Viktor
A2 - Majumdar, Rupak
T2 - 29th International Conference on Computer Aided Verification, CAV 2017
Y2 - 24 July 2017 through 28 July 2017
ER -