
R&D
Publicly Verifiable & Private Collaborative ML Model Training
MISSION
As part of exploring what is possible with Noir, Aztec Labs launched a Noir Research Grant, soliciting proposals from teams to build infrastructure for real world applications that leveraged Noir and private shared state. Private shared state enables blockchain applications with the ability to do expressive computation over encrypted state. ZK allows for the state owner (i.e. prover's witness) to prove properties about that state. But if others need to compute over that state, the prover would need to share their witness. Advanced cryptographic primitives such as MPC, FHE and TEEs provide a way for a data owner to allow others to use their state without giving up the privacy of their state. With Noir, private shared state applications also get the benefit of public verifiability. This means that the execution of a computation over encrypted state can be proven to be valid without re-executing the computation. This proof can be verified by anyone on-chain. HashCloak set out to implement a privacy-preserving training pipeline for logistic regression leveraging multiparty computation (MPC) and zero-knowledge (ZK) proofs.
CHALLENGE
As part of this research project, we combined MPC and ZK in order to build a privacy-preserving training pipeline for logistic regression. In order to achieve this, we combined TACEO's co-noir framework that allowed us to write Noir circuits that compiles to a co-snarks backend. Co-snarks, short for Collaborative SNARKs, is a cryptographic protocol that uses multiparty computation (MPC) to generate ZK proofs without leaking the witness to all of the parties generating the proof. Co-snarks were the key part of this project as it allows us to train a linear regression model without leaking the weights of the model. We had to implement fixed point arithmetic in Noir and use polynomial approximations to approximate the sigmoid function used in logistic regression models.
GOAL
In our initial implementation, it took 1.3 million gates to train a model with 30 samples using 20 epochs. Using commodity hardware and a simulated MPC network with 3 parties, it took 1.1 hours to execute the training. We showed that privately training a logistic regression model is realistic using the current state of the art cryptography tools for private shared state.
