Monday, September 17, 2007

The Role Mining Problem: Finding a Minimal Descriptive Set of Roles

@inproceedings{vaidya07rmp,
author = {Jaideep Vaidya and Vijayalakshmi Alturi and Qi Guo},
title = {The Role Mining Problem: Finding a Minimal Descriptive Set of Roles},
booktitle = {SACMAT '07: Proceedings of the twelth ACM symposium on Access control models and technologies},
year = {2007},
address = {Sophia Antipolis, France},
publisher = {ACM Press},
}

This paper formally describes the aim of role engineering with data mining through RMP, the role mining problem as well as two variations of the problem: δ-approx RMP and the minimal noise RMP. All three problems are shown to be NP-complete.

The RMP is defined as the the problem of finding the optimal set of roles from existing user permissions. The paper defines the optimal/good set of roles as the set of minimal roles that represent the initial access control matrix. Like me, they define the discovery of roles as a decomposition of the access control matrix. (A = B ⊗ C)

To measure matrices, difference metrics for matrix normals are employed. That is, a count of differences between binary matrix values. In RMP, the composition of the user role matrix and the role permission matrix should be exactly the same as the the access control matrix. That is, the decomposition of the access control matrix should not give extra or remove existing permissions. In δ-approx RMP, access control matrix resulting from the mining is allowed to differ from the original access control matrix by δ. The mining is bound by δ. In minimal noise RMP, the δ is minimized while ensuring the number or roles does not exceed a certain threshold. The mining is bound by the number of roles.

The RMP and it's variations are then mapped to the set basis problem and shown to be NP-complete. The RMP problem and its variants are then mapped onto existing problems (minimum tiling in databases and discrete basis) and analysed. A solution for the minimum tiling problem is a greedy approximation algorithm that finds largest tiles first. This is synonymous to finding roles with the largest size first. Semantically, this is probably not quite right for RBAC. This approximation is bound to O(logmn) of optimal where m is number of users and n is number of permissions. The minimal noise RMP is mapped to the discrete basis problem and basis usage problem. The discrete basis problem has been shown to have no approximation in constant factor polynomial time. The discrete basis problem is synonymous to the optimal assignment of minimal noise role to users in the minimal noise RMP context.

No comments: