Can anonymous data access train Artificial Intelligence models?

One of the significant challenges faced when trying to protect privacy while allowing access to data stores is debugging. Specifically not all data is organized in a standard format. When you are training an AI model on data you are generally looking for the signal inside the noise. If you do not have direct access to the data, then debugging the model can be a difficult or even impossible task.

There is a famous example where an AI was being trained on images to recognize the difference between a husky and a wolf. It came up with a very accurate model by recognizing that pictures of the Husky generally included snow. An accurate but not useful model.

In order to solve this issue it will be important that the sample sets of data are true and accurate reflections of the actual data. This may also require in some instances a communication between the data provider and the modeler trying to train the data. To achieve true anonymity this can not be the case.

We believe that developing and working with standards for describing the data and for exception handling is of fundamental importance to the task of enabling privacy preserving access. A large portion of our time will be mediating the customer and data provider experience with the objective of allowing this to to be a reliable and ultimately automate-able process. We recognize this is making the impossible simply very difficult.