TY - JOUR
T1 - Database Repairing with Soft Functional Dependencies
AU - Carmeli, Nofar
AU - Grohe, Martin
AU - Kimelfeld, Benny
AU - Livshits, Ester
AU - Tibi, Muhammad
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/4/10
Y1 - 2024/4/10
N2 - A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair"of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world"of a probabilistic database, a "most likely intention"of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. The work described in this manuscript makes significant progress in that direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases (and some generalizations thereof): a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching"of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy. For these special cases, we also investigate the complexity of additional computational tasks that arise when the soft constraints are used as a means to represent a probabilistic database in the case of a probabilistic unclean database.
AB - A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair"of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world"of a probabilistic database, a "most likely intention"of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. The work described in this manuscript makes significant progress in that direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases (and some generalizations thereof): a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching"of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy. For these special cases, we also investigate the complexity of additional computational tasks that arise when the soft constraints are used as a means to represent a probabilistic database in the case of a probabilistic unclean database.
KW - Additional Key Words and PhrasesDatabase inconsistency
KW - database repairs
KW - functional dependencies
KW - integrity constraints
KW - soft constraints
UR - http://www.scopus.com/inward/record.url?scp=85193505547&partnerID=8YFLogxK
U2 - 10.1145/3651156
DO - 10.1145/3651156
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85193505547
SN - 0362-5915
VL - 49
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
IS - 2
M1 - 8
ER -