Thursday, March 08, 2007

Synthetic Test Data Generation

This week I have mostly been working in the synthetic data generation test suite. The purpose of the data is to create RBAC infrastructures, extract only the user-permission assignments from them and mine back the original roles.

While I'm not entirely convinced of the usefulness of this (skewed roles, random roles represent real enterprises?), the research community seem to think it's useful. So the plan of my algorithm is as follows:

Inputs:
  1. number of users
  2. number of roles
  3. number of permissions
  4. average number of roles per user
  5. standard deviation of roles per user
  6. average number of permissions per role
  7. standard deviation of permissions per role


Constructing roles to users and permissions to roles then extracting fundamental relationships should not be too difficult. The only issue is how real is our model. What kind of distribution could our model take? So to start with, the number of roles to users and permissions to roles are normally (Gaussian) distributed. (Most thing are normal right?) After the number of roles and permissions are picked for each user and role respectively, that number of roles and permissions are picked according to a normal distribution given the mean and std. That is, which permission and which role is picked at based on a Gaussian number generator.

I initially had some problems with the Gaussian distribution, requiring an inverse error function to transform randomly generated numbers into a normal distribution. But then I was directed to some C code that does magical stuff. I was also made aware that Java's normalised Gaussian number generator could easily be translated to a number of whatever mean and std I gave it, so that's making the coding process easier.

Alternative, permissions or roles could be assigned randomly to roles and users respectively, given a max number of permissions per role and roles per user. Either option is feasible since the actual distribution is unknown.

No comments: