Can anonymous data access train Artificial Intelligence models?

One of the significant challenges faced when trying to protect privacy while allowing access to data stores is debugging. Specifically not all data is organized in a standard format. When you are training an AI model on data you are generally looking for the signal inside the noise. If you do not have direct access to the data, then debugging the model can be a difficult or even impossible task.

There is a famous example where an AI was being trained on images to recognize the difference between a husky and a wolf. It came up with a very accurate model by recognizing that pictures of the Husky generally included snow. An accurate but not useful model.

In order to solve this issue it will be important that the sample sets of data are true and accurate reflections of the actual data. This may also require in some instances a communication between the data provider and the modeler trying to train the data. To achieve true anonymity this can not be the case.

We believe that developing and working with standards for describing the data and for exception handling is of fundamental importance to the task of enabling privacy preserving access. A large portion of our time will be mediating the customer and data provider experience with the objective of allowing this to to be a reliable and ultimately automate-able process. We recognize this is making the impossible simply very difficult.

Why use a Blockchain for Data Markets

It is useful as a  framework for understanding the value of a blockchain to data markets to first consider what is a blockchain.  Fundamentally a blockchain is an accounting system.  It’s an accounting system that uses math to allow trust to be shared between parties that might have no other relationship or reason to trust one another.

It is also an effective system for tracking a very large number of very tiny transactions.  In a digital world where we often need to assign very small values to very small transactions such as a bit of data, blockchain allows you to make the transaction synonymous with the  accounting.  There is no separation between the transaction and the accounting record.  You then have the additional value that all parties involved in the transaction are working off the same distributed ledger.  There can be no issue with accounting, as all parties are working form the same record, which is reflective of the actual events.

Now let’s apply this idea to a a data market.  in a data market we will often have transaction taking place between parties who have no other business relationship.  More, we may have parties who explicitly do not wish to to have their identity known to the other party.  The blockchain allows the accurate accounting and payment for services that can include ananonomy of one or both parties.  It may be a competitive advantage for a buyer of data not to advertise that this is the source of their information.  It may also be of the interest to the partner providing the data.

The next useful function of a blockchain is in the characteristics that are tied to the token itself.  You can think of the token not only as the mechanism for accounting, but you can assign it values such as a API key. This mean you can tie access control; the who has access and to what for what period to the token.  This then gives you a an easily managed component to interact with smart contracts.  The smart contracts can be used to automate a wide variety of functions within the data market.

The net result is that the blockchain when viewed as the accounting and access control component to a data market becomes an essential enabler of automation.  The blockchain dramatically decreases complexity in the workflow while enabling a number of mission critical functions that simply could not be achieved without it.