From stochastic matrices to process matrices
Posted on 02 April 20181. Introduction¶
It took me a while to understand process matrices. Part of the problem is a lack of introductory level material that would explain the similarities and differences between the classical and quantum case across the board -- starting from transformations on classical probability distributions to quantum combs and process matrices. Since process matrices are useful constructs, it is worth addressing this problem, along with snippets of Python code to illustrate the concepts as we go. This article will forgo mathematical rigour, favouring intuition instead, and the attention is restricted to finite dimensional systems.
First, we import everything we will use.
import itertools
import numpy as np
from qutip import basis, cnot, dag, expect, hadamard_transform, ket2dm, \
partial_transpose, qeye, Qobj, sigmax, sigmay, sigmaz, swap, tensor, to_choi
np.set_printoptions(suppress=True, precision=3)
2. Classical probability theory and stochastic matrix¶
Let us starting from the measure theoretic approach to probability theory. Given an arbitrary set $\Omega$ (the sample space), we take an $\mathcal{F}$ and a $\sigma$-algebra on it. A measure $P$ defined on $\mathcal{F}$ is a probability measure if it is normalized to one on the entire sample space, that is, $P(\Omega ) = 1$. A random variable $X$ maps from the sample space to some measurable space $E$, that is $X: \Omega \to E$. The probability that X takes values in a measurable set $S\subseteq E$ is $\operatorname {Pr} (X\in S)=P(\{\omega \in \Omega |X(\omega )\in S\})$.
A discrete probability distribution is the collection of values of the measure of pre-images of random variables, inducing a probability mass function $f_{X}(S):=P(X^{-1}(S))$, where $S$ is countable. It can be written as a vector where the elements add up to one, and all elements are non-negative. The normalization is essentially a constraint on the $\mathcal{l}_1$ norm. This is called the probability vector or stochastic vector. A left stochastic matrix will map stochastic vectors to stochastic vectors when multiplied from the left. In other words, it maps probability distributions to probability distributions. For example, starting with a unbiased coin, the map $M$ will transform the distribution to a biased coin:
p = np.array([.5, .5])
M = np.array([[0.7, 0.6], [0.3, 0.4]])
M.dot(p)
Stochastic maps are frequently used in stochastic processes, which is a collection of time-dependent random variables defined over the same probability space.
3. Unitary evolution of quantum systems¶
One, somewhat sloppy way of thinking about quantum states is that they are probability distributions where instead of the real-valued stochastic vectors, we have complex-valued vectors, and the normalization to the unit sphere is in the $\mathcal{l}_2$-norm, instead of the $\mathcal{l}_1$-norm of stochastic vectors. This property naturally leads to the trace-1 requirement of density matrices. Let's consider the following state in the two-qubit state $\mathbb{C}^2\otimes\mathbb{C}^2$:
ψ0 = tensor(basis(2, 0), basis(2, 0))
ψ0.norm()
Just like a stochastic matrix maps probability distributions to distributions, unitary operators map quantum states to quantum states. Take the following unitary operator acting on the two qubit state:
U = cnot(2)*tensor(hadamard_transform(1), qeye(2))
This will create the maximally entangled state if it acts on $\psi_0$. We can easily verify this:
ψ1 = U*ψ0
ϕ = (tensor(basis(2, 0), basis(2, 0)) + tensor(basis(2, 1), basis(2, 1))).unit()
print(ϕ == ψ1)
We can also easily verify unitarity:
U.dag()*ψ1 == ψ0
The same ideas in the density matrix formalism:
ρ1 = U*ket2dm(ψ0)*U.dag()
print(ket2dm(ϕ) == ρ1)
print(U.dag()*ρ1*U == ket2dm(ψ0))
Unitary maps are not the only maps acting on quantum states that yield quantum states. To generalize the formalism, first we have to introduce measurements.
4. Measurements¶
A projection-valued measure on a measurable space (X, M), where M is a $\sigma$-algebra of subsets of X, is a mapping $\Pi$ from M to the set of self-adjoint projections on a Hilbert space H (i.e. the orthogonal projections) such that
$\Pi (X)=\operatorname {id} _{H}\quad $
(where id H $\operatorname {id} _{H}$ is the identity operator of $H$) and for every $\xi, \eta \in H$, the set-function
$ E\mapsto \langle \Pi (E)\xi \mid \eta \rangle$
is a complex measure on M. If π is a projection-valued measure and $E\cap F=\emptyset$, then the images $\Pi(E)$, $\Pi(F)$ are orthogonal to each other.
The measurable space $X$ is the value space for some quantum property of the system (an observable or measurement), and the projection-valued measure $\Pi$ expresses the probability that the observable takes on various values. The probability that the observable takes its value in E given the system in state $\phi$ is
$P=\langle \phi |\Pi (E)|\phi \rangle$
One important point to note is that the quantum state is embedded in the probability measure. The values of the observable are essentially random variables. The terminology is confusing: a measurement (or observable) in quantum mechanics is made up of operator-valued measures, and by defining the probability measure as above, we map an assignment $\Pi(E)$ to the usual real-valued probability.
Let's define a measurement, the projection operators to the computational basis in the first qubit of a two-qubit space.
Π = [tensor(ket2dm(basis(2, 0)), qeye(2)), tensor(ket2dm(basis(2, 1)), qeye(2))]
According to he above, the probability of measuring $|0\rangle$ is
ψ1.dag()*Π[0]*ψ1
Equivalently, we can calculate the values in the density matrix formalism by taking the trace, $\operatorname{tr}[\Pi_i\rho]$:
(Π[0]*ket2dm(ψ1)).tr()
Qutip can calculate the values irrespective of the representation with the function expect
:
print(expect(Π[0], ψ1))
print(expect(Π[0], ket2dm(ψ1)))
Positive operator-valued measures (POVMs) are like projection-valued measurements, but without the orthogonality constraint. A measurement made up of POVMs is sometimes confusingly called an unsharp measurement, in an analogue to mixed states. This is confusing because they are strictly more powerful than projection-valued measurements (for instance, a POVM can extract more randomness from a qubit), and you need to add noise to make them simulable by projection-valued measures. Here we defined the tetrahedron measurement on a qubit:
b = [np.array([ 1, -1, -1])/np.sqrt(3),
np.array([-1, -1, 1])/np.sqrt(3),
np.array([ 1, 1, 1])/np.sqrt(3),
np.array([-1, 1, -1])/np.sqrt(3)]
tetrahedron = [(qeye(2)+sum(bij*sigma for bij, sigma in zip(bj, [sigmax(), sigmay(), sigmaz()])))/4
for bj in b]
print([effect.eigenenergies() for effect in tetrahedron])
print(sum(tetrahedron) == qeye(2))
The measurement will change the quantum state according to the corresponding eigenvector of the measurement. For instance, the first outcome of $\Pi$ will yield the state $|\psi_2\rangle = \frac{\Pi_0|\psi_1\rangle}{\sqrt{\langle \psi_1 |\Pi_0|\psi_1\rangle}}$, or, equivalently, $|\psi_2\rangle\langle\psi_2| = \frac{\Pi_0|\psi_1\rangle\langle \psi_1|\Pi_0}{\operatorname{tr}[\Pi_0|\psi_1\rangle\langle \psi_1|]}$.
ψ2 = Π[0]*ψ1/np.sqrt(expect(Π[0], ρ1))
print(ψ2)
ρ2 = Π[0]*ρ1*Π[0]/expect(Π[0], ρ1)
print(ρ2)
The two projection operators in this measurement project to the eigenspaces in the spectral decomposition of the operator $\sigma_z$, therefore we can also say that the observable is $\sigma_z$.
Π == [tensor(ket2dm(sigmaz().eigenstates()[1][1]), qeye(2)),
tensor(ket2dm(sigmaz().eigenstates()[1][0]), qeye(2))]
5. CPTP maps¶
CPTP maps give a common framework to talk about quantum operations, let those be measurements, unitary evolution, or restrictions on the Hilbert space (i.e. taking the partial trace in a subsystem). Quantum channels, the most generic operations on quantum states, are CPTP maps. Quantum channels are the generalization of stochastic maps to quantum states.
A linear map $\Phi$ is positive if $\Phi(X)$ is positive semidefinite whenever $X$ is positive semidefinite. Complete positivity (CP) is a stronger requirement: they are maps $\Phi(X)$ such that $\mathbb{1}_k\otimes\Phi(X)$ is a positive map for any natural number $k$. To me, this looked like a weird definition until I realized that $k$ can be zero. If we take $\Phi$ to be the transposition, it clearly is positive for $k=0$, but it becomes a partial transpose $k>0$, and may not be be positive. Here is an example applied to the maximally entangled state:
ρ1 = ket2dm(ψ1)
print(ρ1.eigenenergies(), ρ1.trans().eigenenergies(),
(partial_transpose(ρ1, [0, 1])).eigenenergies())
It is, however, positive for all separable states. Example:
ρs = ket2dm(tensor(basis(2, 0), basis(2, 0)))
print(ρs.eigenenergies(), ρs.trans().eigenenergies(),
(partial_transpose(ρs, [0, 1])).eigenenergies())
A map $\Phi$ is CP if and only if its Choi matrix is positive semidefinite (see Choi's theorem). The Choi matrix is the most straightforward way of creating a matrix of the map $\Phi: C^{n×n}\rightarrow C^{m×m}$: we apply to the matrices $E_{ij}\in C^{n×n}$ with 1 in the $ij$-th entry and 0s elsewhere. Formally,
$$ C_{\Phi }=\left(\operatorname {id} _{n}\otimes \Phi \right)\left(\sum _{ij}E_{ij}\otimes E_{ij}\right)=\sum _{ij}E_{ij}\otimes \Phi (E_{ij})\in \mathbb {C} ^{nm\times nm}$$This equivalence is called the Choi-Jamiolkowski isomorphism (Matt Leifer has a post on it, and Nathan Johnston a two-part intro (part 1 and 2). The proof of the theorem gives the decomposition $\Phi (X)=\sum _{i=1}^{nm}K_{i}XK_{i}^{\dagger}$, where we call the matrices $K_i$ Kraus operators. This decomposition is not unique.
The Choi matrix looks like a quantum state. Among the many convenient features of this isomorphism, one is that entanglement-breaking channels map to separable states in the Choi matrix representation.
A measurement effect is CP. For instance, if we take the effect $\Pi_0$, we can calculate its Choi matrix and its eigenvalues:
to_choi(Π[0]).eigenenergies()
A trace-preserving (TP) map will preserve the trace, which a measurement effect does not do. A map $\Phi$ is trace-preserving if and only if ${\rm tr}_2(C_\Phi)=\mathbb{1}_n$.
The normalization constraint on the measurement effects ensures that the sum of them is a CPTP map. A quick check:
C_Π = to_choi(sum(to_super(Πi) for Πi in Π))
print(C_Π.eigenenergies())
C_Π.ptrace([0, 1])
In this sense, CPTP maps are deterministic. The measurement outcome does not play a role in the quantum channel. A quantum instrument generalizes this further by taking the outcome into account. A quantum instrument is a collection of CP maps such that the sum is CPTP.
Now we can write combinations of unitary evolutions and measurements in the same framework:
channel = sum(to_super(Πi*U) for Πi in Π)
C_channel = to_choi(channel)
print(C_channel.eigenenergies())
C_channel.ptrace([0, 1])
The partial trace operator is also CPTP, which means that we can apply a unitary (hence reversible) operation on a composite system, and then trace out part of it, resulting in irreversible dynamics. This is the basic principle of open quantum systems. CPTP maps do not depend on inputs or history, and in this sense, they are memory-less (Markovian). Furthermore, the spatial and temporal order of the operation is fixed a priori. The spatial order is defined by the tensor product structure of the Hilbert space, where the temporal order is fixed by the sequence of joining the CPTP operators together. So what if we have memory effects or we don't want to assume an a priori spatiotemporal structure?
6. Process matrices¶
The process matrix framework was introduced in Oreshkov, Costa, Brukner: Quantum correlations with no causal order, Nature Communications 3, 1092, 2012 and it generalizes quantum combs by not assuming a causal order. Instead of the spatial arrangement introduced by the composite Hilbert space, we define an input and an output Hilbert space associated with quantum events. A quantum even $A$ is performed in a closed lab, and it is a CP map $\mathcal{M}^{A_I\to A_O}$ mapping from the input Hilbert space $H^{A_I}$ to the output Hilbert space $H^{A_O}$. To form a quantum instrument, a collection of such maps must sum to a CPTP map. Let's denote the transposed Choi map of $\mathcal{M}^{A_IA_O}$ by $M^{A_IA_O}$ (the transposition happens for notational convenience). For $n$ parties, we can write a generalized Born rule in the following form:
$\operatorname{tr}\left[W^{A^1_IA^1_O\ldots A^n_IA^n_O}(M^{A^1_IA^1_O}\otimes M^{A^n_IA^n_O})\right]$,
where $W^{A^1_IA^1_O\ldots A^n_IA^n_O}$ is the process matrix mitigating the correlations. The tensor product structure in $M^{A^1_IA^1_O}\otimes M^{A^n_IA^n_O}$ specifies spatiotemporal separation, and not necessarily spatial or temporal. For instance, it could be a series of signaling measurements at the same location. The process matrix plays the role of the density matrix in the traditional Born rule. Just like the density matrix, the process matrix is also positive.
The process matrix encompasses everything that is deterministic about the system, whereas the quantum instruments are stochastic. So similarly to the case of the Born rule, a quantum instrument is a map from some $\sigma$-algebra to CP maps, and a fixed process matrix together with the trace operator defines a probability measure on the CP maps.
Take the following example that includes signaling. Bob starts with a state $\rho^{B_1}$, measures it with some measurement $\mathcal{M}^B$, and sends the output to Alice through a quantum channel $\mathcal{C}$ (its corresponding Choi matrix is $C^{B_2A_1}$). Then Alice measures it with some measurement $\mathcal{M}^A$. The corresponding process matrix is $W^{A_1A_2B_1B_2} = \mathbb{1}^{A_2}\otimes(C^{B_2A_1})^T\otimes\rho^{B_1}$ The channel is chosen to be the Hadamard gate and the initial state $|0\rangle$.
ρ_B1 = ket2dm(basis(2, 0))
C = hadamard_transform(1)
W = tensor(qeye(2), to_choi(C).trans(), ρ_B1)
W.eigenenergies()
This formalism gives a convenient way to characterize memory effects and non-Markovian dynamics, whereas CPTP maps are restricted to memory-less processes. Here is an example taken from the appendix of arXiv:1801.09811: take an uncorrelated two-qubit state, and designate one as the environment and one as the system. Apply a swap, do some measurement on the system, then apply the swap again. No matter what you measure, you will get the original state of the system back at the end. The corresponding process matrix is $W = \mathbb{1}\otimes(C^{\operatorname{swap}})^T\otimes(C^{\operatorname{swap}})^T\otimes\rho^{SE}$
ρ_SE = ket2dm(tensor(basis(2, 0), basis(2,0)))
W = tensor(qeye(4), to_choi(swap()).trans(), to_choi(swap()).trans(), ρ_SE)
Quantum combs define a causal order, but process tensors don't. It could be that a non-Markovian process is the probabilistic mixture of Markovian processes with different causal order, resulting in a non-causally ordered process matrix. If signaling is not allowed from A to B, we write $A\npreceq B$. Causally separable processes are convex probabilistic mixtures of two possible signaling orders $qW^{A\npreceq B}+(1-q)W^{B\npreceq A}$. Whether a process matrix is causally separable is easily witnessed by an SDP. Eventually these sets form a hierarchy, and, apart from the Markovian set, all of them are convex. Here is a cartoonish representation:
All of this formalism is device-dependent: it assumes quantum mechanics and particular forms of processes. If we move to a device-independent setting studying only correlations and violations of inequalities, for instance, causal inequalities bounding causal correlations, then we often end up with convex polytopes.
Acknowledgements¶
I thank Peter Bruza, Fabio Costa, Christina Giarmatzi, Sally Shrapnel, Jansen Zhikuan Zhao, and Alejandro Pozas-Kerstjens for discussions.