Florian Thaeter, Rüdiger Reischuk:
Scalable k-anonymous Microaggregation: Exploiting the Tradeoff between Computational Complexity and Information Loss.
In Proceedings of the 18th International Conference on Security and Cryptography (SECRYPT 2021), S. 87-98.
SCITEPRESS,
2021.
Zusammenfassung anzeigen
k-anonymous microaggregation is a standard technique to improve privacy of individuals whose personal data is used in microdata databases. Unlike semantic privacy requirements like differential privacy, k-anonymity allows the unrestricted publication of data, suitable for all kinds of analysis since every individual is hidden in a cluster of size at least k. Microaggregation can preserve a high level of utility, that means small information loss caused by the aggregation procedure, compared to other anonymization techniques like generalization or suppression. Minimizing the information loss in k-anonymous microaggregation is an NP-hard clustering problem for k ? 3. Even more, no efficient approximation algorithms with a nontrivial approximation ratio are known. Therefore, a bunch of heuristics have been developed to restrain high utility – all with quadratic time complexity in the size of the database at least. We improve this situation in several respects providing a tradeoff between computational effort and utility. First, a quadratic time algorithm ONA* is presented that achieves significantly better utility for standard benchmarks. Next, an almost linear time algorithm is developed that gives worse, but still acceptable utility. This is achieved by a suitable adaption of the Mondrian clustering algorithm. Finally, combining both techniques a new class MONA of parameterized algorithms is designed that deliver competitive utility for user-specified time constraints between almost linear and quadratic.