*Update**: An extended version of this post will appear in the upcoming book Quantum Machine Learning: What Quantum Computing Means to Data Mining.*

*Update 2**: Some clarifications are made in a new post.*

Take a high-level view on machine learning: given a training set where is a finite dimensional data point, and is an associated outcome, the task is to approximate an unknown function with such that for all Then, encountering a previously unseen we calculate

This task translates well to quantum process tomography -- a problem when an unknown quantum dynamical process has to be identified. The dynamical process is a series of unitary transformations, also called a channel. If we denote the unitary by the goal becomes to derive an estimate such that Then, just as in the classical case, we would like to calculate for a new state (Bisio et al., 2010).

In a classical setting, we define an objective function, and we seek an optimum subject to constraints and assumptions. For instance, in linear least squares, we assume that the unknown function has the form We seek to minimize the squared residual The assumption in learning by quantum process tomography is that the channel is unitary and that the unitary transformation is drawn from a group -- that is, it meets basic symmetry conditions (Chiribella et al., 2005). The objective function is replaced by the fidelity of quantum states.

Apart from these similarities, the rest of the learning process does not resemble the classical variant. Unlike in the classical setting, learning a unitary requires a double maximization: we need an optimal measuring strategy that optimally approximates the unitary, *and* we also need an optimal input state that best captures the information of the unitary (Acín et al., 2001). The key steps are as follows.

- Storage and parallel application of the unitary on a suitable input state that achieve optimum storage.
- A superposition of maximally entangled states is the optimal input state.
- A measure-and-prepare strategy on the ancilla with an optimal POVM is best for applying the learned unitary on an arbitrary number of new states.

This blog post interprets Bisio et al., 2010 by following through these steps.

# Parallel application and storage of unitary

The task is simple: we have a black box that implements an unknown unitary and we can make calls to it to identify the unitary. acts on a finite -dimensional Hilbert space, and performs a deterministic transformation belonging to a given representation of a compact Lie group (Chiribella, 2011). The deterministic transformation is also known as a quantum channel.

Two immediate problems arise: (i) How do we store the approximated unitary? (ii) How do we dispose the uses? In parallel or in sequence?

The first question is easier to address: the Choi-Jamiołkowsky duality enables storing the unitary as a state. Denote the stored state as

If we take the fidelity of output quantum states as the figure of merit, the optimal storage is achieved by a parallel application of the unitaries on an input state. Denote by the Hilbert space of all inputs of the examples, and by the Hilbert space of all outputs. With acting on , we have the following lemma (Bisio et al., 2010):

**Lemma (Optimality of parallel storage)** The optimal storage of
can be achieved by applying
on a suitable multipartite input state

The next question is what that suitable input state might be.

# Optimal state for learning

Classical learning takes the training examples, and tries to make the best of them. Some variants may ask for specific extra examples, as in active learning, which is, in turn, a flavour of semi-supervised learning. Other algorithms learn to ignore irrelevant examples, such support vector machines with *C*-regularization. Quantum process tomography, on the other hand, requires and *optimal* input state, which also has to be entangled.

To give an intuition why entanglement is necessary, consider superdense coding: two classical bits are sent over a quantum channel by a single qubit. The two communicating parties, Alice and Bob, have a half of two entangled qubits -- in fact, a Bell state -- each. Alice applies one of four unitary transformations on her part, translating the Bell state into a different one, and sends the qubit over. Bob measures the state, and deduces which one of the four operations was used, thus retrieving two classical bits of information. The use of entanglement with an ancillary system improves the discrimination of unknown transformations; this motivates using such states in generic process tomography (Chiribella, 2011).

A representation of the unitary group is *irreducible* in an invariant subspace, if the subspace does not have a proper subspace that is invariant. Any unitary representation
of a compact Lie group can be decomposed into the direct sum of a discrete number of irreducible representations (Chiribella, 2006, p.22).

The Clebsch-Gordan tensor product structure (or Clebsch-Gordan decomposition) of unitary representation is given by acting on (Chiribella et al., 2005; Chiribella, 2006, p.26). is called the representation space, and is the multiplicity space. is the identity on the -dimensional Hilbert space.

Using the decomposition of , we have the following lemma to identify the optimal input state (Bisio et al., 2010):

**Lemma (Optimal states for storage)** The optimal input state for storage can be taken of the form
where
are probabilities, the index
runs over the set
of all irreducible representations
contained in the decomposition of
is the dimension of the corresponding subspace, and
is a subspace of
carrying the representation
,
being the identity in

The optimal state for the estimation of an unknown unitary is always a superposition of maximally entangled states (Chiribella et al., 2011).

# Applying the learned function

A measure-and-prepare strategy is optimal for applying the learned function an arbitrary number of times. This strategy consists of measuring the state in the quantum memory with an optimal POVM, and performing the unitary on the new input state.

There is no loss of generality in restricting the search space to covariant POVMs (Holevo, 2011) of the form with a positive operator satisfying the normalizing condition (Chiribella et al., 2005). The following theorem provides the optimal measurement to retrieve the approximated unitary (Bisio et al., 2010):

**Theorem (Optimal retrieving strategy)** The optimal retrieving of
from the memory state
is achieved by measuring the ancilla with the optimal POVM
given by
and, conditionally on outcome
by performing the unitary
on the new input system.

The measurement will be optimal in the fidelity of quantum states, and it is also optimal for the maximization of the single-copy fidelity of all locals channels. The fidelity does not degrade with repeated applications.

# Closing remarks

From the perspective of machine learning, restricting the approximated function to unitaries is a serious constraint. It implies that the function is a bijection, ruling classification out. This methodology is relevant in regression problems.

The necessity of an optimal input state is interesting, but it is unclear what it means to learning performance, and especially to generalization power. The Clebsch-Gordan decomposition vaguely resembles a form of biclustering, followed by a transformation. It would be good to see a formal link between these concepts.

# References

Acín, A.; Jané, E. & Vidal, G. Optimal estimation of quantum dynamics. *Physical Review A*, 2001, 64, 050302.

Bisio, A.; Chiribella, G.; D'Ariano, G.; Facchini, S. & Perinotti, P. Optimal quantum learning of a unitary transformation. *Physical Review A*, 2010, 81, 032324.

Chiribella, G.; D'ariano, G. & Sacchi, M. Optimal estimation of group transformations using entanglement. *Physical Review A*, 2005, 72, 042338.

Chiribella, G. Optimal estimation of quantum signals in the presence of symmetry. PhD thesis. *University of Pavia*, 2006.

Chiribella, G. Group theoretic structures in the estimation of an unknown unitary transformation. *Journal of Physics: Conference Series*, 2011, 284, 012001.

Holevo, A. *Probabilistic and statistical aspects of quantum theory*. Springer, 2011.

[…] just concluded, and I presented a poster on transductive and active learning in the quantum learning of unitaries (Wittek, 2014). We had some good discussions on the topic, particularly on the differences between […]