Jekyll2018-12-01T23:52:27+01:00https://luongo.pro/returnlambdaThe level of achievement that you have in anything, is a reflection of how well you were able to focus on it (Steve Vai)
Quantum BLAS: Basic Linear Algebra Subprograms with quantum computers2018-12-01T00:00:00+01:002018-12-01T00:00:00+01:00https://luongo.pro/2018/12/01/Quantum-BLAS<p>In this post we are going to see how well quantum computers can perform linear algebraic operations: matrix multiplication, matrix inversion, and projection of a vector in a subspaces. As usual, we suppose that we have access to the matrix and the vectors via a <a href="/qml/qram">QRAM</a>: a classical data structure that allows us to build quantum states proportional to the matrix and vectors we need to operate.</p>
<p>Historically, the trend of algorithms for doing linear algebra was started with the well known <a href="/qml/hhl">HHL</a> algorithm: the algorithms that builds a quantum state proportional to the solution of a sparse and well conditioned system of equations. While HHL wasn’t originally based on QRAM, it relies on similar techniques to extract the singular values of the matrix (namely <a href="/qml/hamsim">Hamiltonian Simulation</a>).</p>
<p>A couple of years ago, we progressed substantially in our ability to perform quantum linear algebraic operations when we learned the ability to write (coherently) the singular values of a matrix in a quantum register <a href="#KP17">(Kerenidis & Prakash, 2017)</a> . The idea of the algorithm was the following: thanks to QRAM queries, it was possible to write the singular values of the matrix in the phase of our quantum state, where it was then recovered (i.e. written in a quantum register) as usual, i.e. using the Quantum Fourier Transform. Unfortunately, this approach have a significant limitation: performing a QTF on a register implies running time linear in the error on the QTF (the running time of the QFT is $O(\log n / \epsilon)$). Only recently, two new works overcame this limitation with an algorithm with logarithmic dependence on the error.
The papers are based on <em>qubitization</em><a href="#HaoLow2016">(Hao Low & Chuang, 2016)</a> and <em>block encoding</em><a href="#CGJ18">(Chakraborty, Gilyén, & Jeffery, 2018)</a> . In this page I want to give you an intuition on how quantum linear algebra works, so we will see the simplest (but non optimal) algorithm based on singular value estimation, and then we just show the last theorem which represent the current state of the art on quantum linear algebra. First, let me introduce the following Theorem.</p>
<h5 id="sve-theorem">S.V.E Theorem</h5>
<p><em>Let $M \in \mathbb{R}^{m \times n}$ be a matrix with singular value decomposition $A =\sum_{i \in \left[n\right]} \sigma_i u_i v_i^T$ stored in QRAM. Let $\varepsilon > 0$ the precision parameter. There is an algorithm with running time $O(polylog(mn)/\varepsilon)$ that performs the mapping $\sum_{i} \alpha_i \ket{v_i} \to \sum_{i}\alpha_i \ket{v_i} \ket{\tilde{\sigma}_i}$, where $\tilde{\sigma_i} \in \sigma_i \pm \varepsilon||M||_F $ for all $i$ with probability at least $1-1/poly(n)$</em></p>
<p>Let’s see a simple algorithm.</p>
<p>Let $M := \sum_{i}^d \sigma_iu_iv_i^T$.
We also assume that <script type="math/tex">\norm{M}_F \leq 1</script>.
Note that by satisfying these conditions, the singular values of the matrix are rescaled to lie between $\kappa$ and $1$. This assumption can be easily satisfied by estimating iteratively the biggest singular value (also known as Perron-Frobenius root) with <script type="math/tex">x_{t+1}= \frac{Mx_t}{ ||Mx_t ||}</script>. If we want to use a algorithms, a polylogarithmic procedure can be found in <a href="#KP17">(Kerenidis & Prakash, 2017)</a>.</p>
<h4 id="quatum-linear-algebra-algorithms">Quatum Linear Algebra Algorithms</h4>
<ul>
<li><strong>Require</strong>:
<ul>
<li>QRAM access to rows of $M \in \mathbb{R}^{n \times d}$ with s.v.d $\sum_{i} \sigma_iu_iv_i^T$ and $\ket{x} \in \mathbb{R}^{d}$.</li>
<li>A threshold $\theta$ and an error parameter $\delta > 0$ for doing matrix projection,</li>
<li>Error $\varepsilon > 0$ for doing matrix multiplication and inversion.</li>
</ul>
</li>
<li><strong>Ensure</strong>:
<ul>
<li>Our quantum register is in state $ \ket{Mx}$, $\ket{M^{-1}x}$, or
<script type="math/tex">\ket{M^{+}_{\theta, \delta}M_{\theta, \delta} x}</script></li>
</ul>
</li>
</ul>
<ol>
<li>
<p>Query the QRAM to obtain the state:
$\ket{x} = \sum_{i} \alpha_i\ket{v_i} $</p>
</li>
<li>
<p>Perform SVE on $M$ with precision $\epsilon$ for matrix
multiplication and inversion and with precision
$\frac{\delta}{2}\frac{\theta}{ \left\lVert M \right\rVert_F}$ for
matrix projection. This step create the state:
$ \sum_{i} \alpha_i\ket{v_i}\ket{\tilde{\sigma}_i} $</p>
</li>
<li>
<p>Perform a controlled operation on an ancilla qubit:</p>
<ul>
<li>for matrix multiplication:
$ \sum_{i}^d \alpha_i\ket{v_i}\ket{\tilde{\sigma_i}}\Big(\tilde{\sigma}_i\ket{0} + \sqrt{1 - \tilde{\sigma}^2_i}\ket{1}\Big) $</li>
<li>for matrix inversion: <script type="math/tex">\sum\_{i}^d \alpha\_i\ket{v\_i}\ket{\tilde{\sigma\_i}} \left(\frac{\tilde{\sigma}\_{min}}{\tilde{\sigma}\_i} \ket{0} + \sqrt{1-\left(\frac{\tilde{\sigma}\_{min}}{\tilde{\sigma}\_i }\right)^2}\ket{1} \right)</script></li>
<li>for projecting in the subspace spanned by singular vector smaller than $\theta$, map
$\ket{\tilde{\sigma}_j}\ket{0} \to \ket{\tilde{\sigma}_j}\ket{0}$
if $\sigma_j < \theta + \frac{\delta}{2}\theta$ and to
$\ket{\tilde{\sigma_j}}\ket{1}$ otherwise.</li>
</ul>
</li>
<li>
<p>Then, uncompute the register with the estimate of the singular
values. You are left with the following states:</p>
<ul>
<li>
<p>for matrix multiplication:
$ \sum_{i}^d \alpha_i\ket{v_i}\Big(\tilde{\sigma}_i\ket{0} + \sqrt{1 - \tilde{\sigma}^2_i}\ket{1}\Big) $</p>
</li>
<li>
<p>for matrix inversion:
<script type="math/tex">\sum\_{i}^d \alpha\_i\ket{v\_i} \left( \frac{\tilde{\sigma}\_{min}}{ \tilde{\sigma}\_i}\ket{0}+ \sqrt{1- \left(\frac{\tilde{\sigma}\_{min}}{\tilde{\sigma}\_i}\right)^2}\ket{1} \right)</script></p>
</li>
<li>
<p>for projection in a subspace:
$ \sum_{i \in S} \alpha_i \ket{v_i}\ket{0} + \sum_{i \in \tilde{S}} \alpha_i\ket{v_i}\ket{1}$
where $S$ is the set of $i$’s such that $\sigma_i \leq \theta$
and some $i$’s such that
$\sigma_i \leq (\theta -\delta/2, \theta ]$, and $\bar{S}$ is
the complement of $S$ in $m$</p>
</li>
</ul>
</li>
<li>
<p>Perform amplitude amplification on $\ket{0}$ in the ancilla qubit, with a unitary $U$ implementing steps 1 and 2, to obtain
respectively: $\ket{Mx}$, $\ket{M^{-1}x}$, or $\ket{M^{+}_{\theta, \delta}M_{\theta, \delta} x}$</p>
</li>
</ol>
<h4 id="original-matrix-algebra-missing-reference">(Original) Matrix Algebra (missing reference)</h4>
<p><em>Let</em>
$M := \sum_{i} \sigma_i u_iv_i^T$
<em>such that $||M||_F \leq 1$ stored in the <a href="/qml/qram">QRAM</a>. There is an Algorithm that returns</em></p>
<ul>
<li>$\ket{M_{\theta, \delta}^+M_{\theta, \delta}x}$
<em>in expected time</em>
$\tilde{O}(\frac{\norm{M}_F{x}^2}{\theta {\lVert M_{\theta, \delta}M^{+}_{\theta, \delta} x \rVert }^2} )$</li>
<li>$\ket{Mx}$ or $\ket{M^{-1}x}$
<em>in expected time</em>
$\tilde{O}(\kappa^2(M)\mu(M)/\epsilon)$
<em>with probability at least $1-1/poly(d)$.</em></li>
</ul>
<p>The function $\mu(M)$ is defined as $ min_{p \in [0,1]} \left( \norm{M}_F, \sqrt{s_{2p}(M) s_{2(1-p)}(M^T)} \right)$,
while $s_p$ is defined as
$ max_{i \in [m]} \norm{m(i)}_p^p $
(the maximum $\ell_p$ norm of the rows of the matrix.).</p>
<p>The optimal value for $p$ depend on the matrix under consideration, but for symmetric matrices we have that $\mu(M)$ is at most the maximum $l_1$ norm of the
row vectors. For the projection, given that $\delta \in (0,1)$, the error that we consider lies in the interval $\left[\theta, (1+\kappa)\theta \right]$, that is, the project of $M$
onto the space spanned by the singular vectors whose corresponding singular values is smaller than $\theta$, and some subset of singular
vectors whose corresponding singular value is slightly bigger than $\theta$. Note that the projection algorithm does not depend on the
condition number of the matrix.</p>
<p>As anticipated before, the biggest issue of the s.v.e approach is that it relies on phase estimation in order to write the singular values of the matrix in a register. This implies that there is a polynomial (linear) dependence on the error/precision on the estimate, and therefore on all our algorithm! We are greedy, and we want to improve also this dependence. This work has been done, and we report here the resulting theorems.</p>
<h5 id="optimized-matrix-algebra">(Optimized) Matrix algebra</h5>
<p><em>Let</em>
$M := \sum_{i} \sigma_iu_iv_i^T \in \mathbb{R}^{d \times d}$
<em>such that</em>
$\norm{M}_2=1$,
<em>and a vector</em>
$x \in \mathbb{R}^d$
<em>stored in <a href="/qml/qram">QRAM</a>. There exist quantum algorithms that with probability at least</em>
$1-1/poly(d)$
<em>return</em></p>
<ul>
<li><em>a state
$\ket{z}$
_such that</em>
$| \ket{z} - \ket{Mx}| \leq \epsilon$
<em>in time</em>
$\tilde{O}(\kappa(M)\mu(M)\log(1/\epsilon))$</li>
<li><em>a state</em>
$\ket{z}$
<em>such that</em>
$|\ket{z} - \ket{M^{-1}x}| \leq \epsilon$
<em>in time</em>
$\tilde{O}(\kappa(M)\mu(M)\log(1/\epsilon))$</li>
<li><em>a state</em>
$\ket{M_{\leq \theta, \delta}^+M_{\leq \theta, \delta}x}$
<em>in time</em>
$\tilde{O}(\frac{ \mu(M) \norm{x}}{\delta \theta \norm{M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta}x}})$</li>
</ul>
<p><em>One can also get estimates of the norms with multiplicative error $\eta$ by increasing the running time by a factor $1/\eta$.</em></p>
<p>Another important advantage of the new methods is that it provides easy ways to manipulate sums or products of matrices. <a href="#GSLW18">(Gilyén, Su, Low, & Wiebe, 2018)</a><a href="#CGJ18">(Chakraborty, Gilyén, & Jeffery, 2018)</a></p>
<h5 id="matrix-algebra-on-products-of-matrices">Matrix algebra on products of matrices</h5>
<p><em>Let $M_1, M_2 \in \mathbb{R}^{d \times d}$ such that $||M_1||_2= ||M_2||_2=1$, $M=M_1M_2$, and a vector $x \in \mathbb{R}^d$ stored in <a href="/qml/QRAM">QRAM</a>. There exist quantum algorithms that with probability at least $1-1/poly(d)$ return</em></p>
<ul>
<li><em>a state</em>
$\ket{z}$
<em>such that</em>
$|\ket{z} - \ket{Mx}| \leq \epsilon$
<em>in time</em>
$\tilde{O}(\kappa(M)(\mu(M_1)+\mu(M_2))\log(1/\epsilon))$</li>
<li><em>a state</em>
$\ket{z}$
<em>such that</em>
$|\ket{z}-\ket{M^{-1}x}| \leq \epsilon$
<em>in time</em>
$\tilde{O}(\kappa(M)(\mu(M_1)+\mu(M_2))\log(1/\epsilon)) $</li>
<li><em>a state</em>
$\ket{M_{\leq \theta, \delta}^+M_{\leq \theta, \delta}x}$
<em>in time</em>
$\tilde{O}(\frac{ (\mu(M_1)+\mu(M_2)) \norm{x}}{\delta \theta \norm{M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta}x}})$</li>
</ul>
<p><em>One can also get estimates of the norms with multiplicative error $\eta$ by increasing the running time by a factor $1/\eta$.</em> <a href="#GSLW18">(Gilyén, Su, Low, & Wiebe, 2018)</a><a href="#CGJ18">(Chakraborty, Gilyén, & Jeffery, 2018)</a>.</p>
<p>As a note, I don’t know if with qubitization it is possible to perform all the operations that are possibile by accessing the singular values of the matrix in a quantum register.</p>
<h3 id="references">References</h3>
<ol class="bibliography"><li><span id="KP17">Kerenidis, I., & Prakash, A. (2017). Quantum Gradient Descent for Linear Systems and Least Squares. <i>ArXiv:1704.04992</i>.</span></li>
<li><span id="HaoLow2016">Hao Low, G., & Chuang, I. L. (2016). Hamiltonian Simulation by Qubitization.</span></li>
<li><span id="CGJ18">Chakraborty, S., Gilyén, A., & Jeffery, S. (2018). The power of block-encoded matrix powers: improved regression techniques via faster Hamiltonian simulation. <i>ArXiv Preprint ArXiv:1804.01973</i>.</span></li>
<li><span id="GSLW18">Gilyén, A., Su, Y., Low, G. H., & Wiebe, N. (2018). Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. <i>ArXiv Preprint ArXiv:1806.01838</i>.</span></li></ol>
<p><a href="https://github.com/Scinawa/QML-tutorials"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png" alt="Fork me on GitHub" /></a></p>scinawaIn this post we are going to see how well quantum computers can perform linear algebraic operations: matrix multiplication, matrix inversion, and projection of a vector in a subspaces. As usual, we suppose that we have access to the matrix and the vectors via a QRAM: a classical data structure that allows us to build quantum states proportional to the matrix and vectors we need to operate.A Quantum Perceptron model2018-08-15T00:00:00+02:002018-08-15T00:00:00+02:00https://luongo.pro/2018/08/15/Quantum-Perceptron<p>Here I explain the work of <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a>. There, they basically applied aplitude amplification tecniques to two different version of the perceptron algorithm. With the first approach - that we describe in this post - the authors were able to gain a quadratic speedup w.r.t the number of elements in the training set. In the second approach, the authors leveraged the description of the perceptron in the so-called <em>version space</em> (the dual of the usual feature space descrption of the perceptron). This allowed them to gain a quadratic improvement w.r.t statistical efficiency: perhaps a more interesting gain than a quadratic speedup with the number of elements in the training set. We will see this the second model of quantum perceptron in another post, since the technique used is basically the same.</p>
<h4 id="the-perceptron">The perceptron</h4>
<p>Let’s briefly introduce the perceptron. We are given a training set $\mathbb{T} = \{ \phi_1 … \phi_N\} $, $\phi_i \in \mathbb{R}^D$ of labeled vectors that belongs to two different classes: $y_i \in \{0,1\}$. For the sake of simplicity, we assume those vectors to be linearly separable. While in practice this is not always the case, it there are statistical guarantees that we still will be able to lern a good-enough separating hyperplane which approximate the best separating hyperplane $w^*$. A (classical) perceptron find an hyperplane $w$ that separates the data of the two classes. More formally, we want to find a $w$ such that $w^T x_i * y_i \leq 0 \quad \forall i \in \left[N\right]$ is true. The intuition for the classical algorithm is the following: start by an initial guess for $w$, and then update you guess by adding to the vector describing the hyperplane $w$ the misclassified vector, and then normalize to keep the norm of $w$ constant. In this way you rotate your current guess of $w$ until it correctly classify the training vector.</p>
<p>We say that the two classes are separated by a <em>margin</em> of $\delta$. Recall that the margin is defined (“a priori” w.r.t the dataset) as:</p>
<script type="math/tex; mode=display">\gamma = min_{i\in T} \frac{|x_i.w^*|}{||x||}</script>
<p>We can think of the margin as a measure of the training set which tells us how much to rotate $w$ each time to change the label of a misclassified vector. For the record, it is possible to prove that the perceptron makes at most $\frac{1}{\gamma^2}$ mistakes for points $\phi_i$ that are separated with angular margin $\gamma$.</p>
<p>In the quantum version of the algorithm we want to perform amplitude amplification to the perceptron, so the idea is to use amplitude amplification <strong>find quicker</strong> the misclassified vectors in the training set.</p>
<p>A quick recap on Grover-like algorithms. In order to apply amplitude amplification to a problem you need build two unitary operators that combined gives you the Grover iterate:</p>
<script type="math/tex; mode=display">U_{Grover}=U_{init}U_{targ}</script>
<p>where $U_{init} = 2\ket{\psi}\bra{\psi}$ is the reflection about the mean, and <script type="math/tex">U_{targ} = \mathbb{I} - 2P</script> is the change of phase of the “good” solution you are targeting in your problem.</p>
<p>By applying for a certain number of times the Grover iterate to the quantum state generated by querying an oracle (in tihs case our quantum memory), we can teak the probability of sampling a misclassified vector from your quantum computer.</p>
<p>More formally, this is the statement of the theorem:</p>
<h6 id="theorem-amplitude-amplification-brassard-hoyer-mosca--tapp-2002">Theorem: Amplitude amplification <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a></h6>
<p><em>Let $A$ be any quantum algorithm that uses no measurements, and let $f : \{0,1 \}^n \to \{0, 1\}$ be any Boolean function. There exists a quantum algorithm that given the initial success probability $a > 0$ of $A$, finds a good solution with certainty using a number of applications of $A$ and $A^{-1}$ which is in $\Theta(1/\sqrt{a})$ in the worst case.</em></p>
<p>In this post we see how to build the quantum circuit for $A$ and for the boolean function $f$ to suits our needs of quantizing a perceptron.</p>
<h4 id="a-different-classical-perceptron">A different classical perceptron</h4>
<p>An underlying assumption that we do in the classical analysis of the algorithm is that we have sequential access to the training set, (like an array). In the quantum algorithm we will drop this assumption, and instead assume that we have access to random samples of the dataset. As you might have imagined, we will query the elements of the training set in superposition, at the cost of introducing the possibility of extracting the same training element multiple times.</p>
<p>Recall that classically, the cost of training a perceptron in the original array-model is:</p>
<script type="math/tex; mode=display">O\left(\frac{N}{\gamma^2}\right)</script>
<p>However, as is shown in the paper, if we are allowed to sample the vectors from the training set the running time of a classical lerner stretches by a logarithmic factor (for reason related to the coupon collector problem) to:</p>
<script type="math/tex; mode=display">O \left(\frac{N}{\gamma^2}\log \left(\frac{1}{\varepsilon\gamma^2}\right) \right)</script>
<p>Obviously, this is proven duly in the paper. <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a></p>
<h4 id="quantum-perceptron">Quantum perceptron</h4>
<p>Let’s see how to speed things up with quantum. If you are already accustomed to quantum computation, perhaps I already gave you a hint on the quantum state we are going to create for our algorithm. We assume to have access to the following oracles:</p>
<script type="math/tex; mode=display">U\ket{j}\ket{0}\to\ket{j}\ket{\phi_j}</script>
<p>And its inverse $U^{\dagger}$. With $U$ (by linearity) We are going to build a uniform superposition of the elements of the training set:</p>
<script type="math/tex; mode=display">U\frac{1}{\sqrt{N}}\sum_{j=0}^N\ket{j}\ket{0} \to \frac{1}{\sqrt{N}}\sum_{j=0}^N\ket{j}\ket{\phi_i}</script>
<p>What does $\ket{\phi_i}$
means in practice? If we have to store a floating point vector $\phi^i \in \mathbb{R}^d$, we can store the m-bit binary representation of the $d$ floating point numbers, and add one qubit to store the label of the vector $y^i$ (map $-1$ to 0 for a negative labels). The authors note that you can interpret this qubit-string as an unsigned integer. They also note that in this way we map each element in the training set to a basis of our Hilber space.</p>
<p>Now that we have our data loaded in our quantum computer, we craft the unitary operator that allows us to test if the perceptron correctly assign a training vector. As in the classical algorithm, we start by a random guess of the weight vector $w_0$. Each time we find a misclassified vector we update our model by adjusting $w_t$, our current guess for $w_*$ as in the classical algorithm.
<br /></p>
<p>Said simply, we need a quantum circuit implementing the perceptron algorithm for a given weight $w$, and “plug it” into amplitude amplification theorem. <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a>. The unitary operator that want to implement to apply amplitude amplification just change the sign of misclassified vectors. We can therefore write it as such:</p>
<script type="math/tex; mode=display">\mathbb{F}_w \Phi_j = (-1)^{f_w(\phi_j,y_j)}\Phi_j</script>
<p>Let me explain this. We define $f_w(\phi, y)$ to be the boolean function of the perceptron function, that given a weight vector $w$ and the class of $\phi$ tells $0$ if the vector is currently well classified according to the label $y$, and return $+1$ if the vector is misclassified.
This will allow us to change the sign of just the misclassified vectors. The unitary implementation of an oracle like this would can be plugged into the circuit for amplitude amplification and gives us an algorithm to do a quantum perceptron. As it is known, we can easily build the quantum circuit from a classical boolean circuit. Therefore we can assume to have the quantum circuit perceptron algorithm (for a given model $w$). We want this quantum circuit computes the following mapping:</p>
<script type="math/tex; mode=display">F_w[j \otimes \phi_0] = (-1)^{f_w(\phi_j, y_j)}[j \otimes \Phi_0]</script>
<p>The unitary operator that we need to implement in order to get a quantum version of $F_w$ can be built in the following way:</p>
<script type="math/tex; mode=display">U_{targ} = F_w = U^\dagger (\mathbb{I} \otimes \mathbb{F}_w ) U</script>
<p>This represent the first part of the ingredients that we need for amplitude amplification. The second part consist in $U_{init}$, which in this case is $U_\text{init}=2\ket{\psi}\bra{\psi} - I$, with $\ket{\psi}=\frac{1}{\sqrt{N}}\sum_{j=1}^N = \ket{j}$</p>
<p>The grover iterate is defined $G=U_{init}U_{targ}$. This is the circuit we need to apply amplitude amplification to a problem. If you don’t believe me, you should check the detailed proof of the paper :)</p>
<p>The main result of this section is the following theorem:</p>
<h5 id="theorem-1-kapoor-wiebe--svore-2016">Theorem 1 <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a></h5>
<p><em>Given a training set that consists of unit vectors $\phi_0, … ,\phi_N$ that are separated by a margin of $\gamma$ in feature space, the number of applications of $F_w$ needed to infer a perceptron model w, such that $P(\exists j : f_w(\psi_j) = 1) \leq \epsilon$ using a quantum computer is $N_\text{quant}$ where:</em></p>
<p><script type="math/tex">\omega \left( \sqrt{N}\right) \ni N_\text{quant} \in O \left( \frac{\sqrt{N}}{\gamma^2} log \left[ \frac{1}{\epsilon \gamma^2} \right] \right)</script>.</p>
<p><em>The number of queries to $f_w$ needed in the classical setting, $N_\text{class}$, where the training vectors are found by sampling uniformly from the training data is bounded by:</em></p>
<p><script type="math/tex">\omega \left( N\right) \ni N_\text{class} \in O \left( \frac{N}{\gamma^2} log \left[ \frac{1}{\epsilon \gamma^2} \right] \right)</script>.</p>
<h4 id="the-algorithm">The algorithm</h4>
<h6 id="require">Require:</h6>
<ol>
<li>Access to oracle $U$ storing $N$ input string</li>
<li>Error parameter $\epsilon$.</li>
</ol>
<h6 id="ensure">Ensure:</h6>
<ol>
<li>An hyperplane approximaitng $w^*$</li>
</ol>
<ul>
<li>Create random vector $w$</li>
<li><em>For</em> $k=1 … \lceil \log_{3/4} (\gamma^2\epsilon) \rceil$
<ul>
<li><em>For</em> $j = 1 … \lceil \log_c (1/\sin(2sin^{-1}(1\sqrt{N)})) \rceil $
<ul>
<li>Sample uniformly in integer $m \in [0…\lceil c^j \rceil ]$</li>
<li>Prepare query register $\ket{\psi}=\sum_{i=1}^N\ket{i}\ket{0}$</li>
<li>Perform $Q^m\ket{\psi}$</li>
<li>Measure the first index register and get $\to i$.</li>
<li>If $f_{w_t}(\phi_i, y_i) =1$ then update $F_{w_t}$</li>
</ul>
</li>
</ul>
</li>
<li>Return $w_t$ to the user.</li>
</ul>
<p>We show how to apply amplitude amplification on the dataset to find all the misclassified vectors with a quantum version of the perceptron circuit. At each iteration, we update our circuit $F_{w_t}$ with the new model of the perceptron and we contrinue untill no misclassified vectors are left in the trainingset.
A couple of sentences on the algorithm. The two loop in the algorithm assure we to be able to find the correct number of times to apply $G$ with an exponential search among the space of parameters. Its is basically a trick to preserve the quadratic speedup without knowing in advance the right number of misclassified vectors for a given perceptron. Anyway, is a trick described properly in the paper and in <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a></p>
<p>Now the user can take the model $w$ and use in its classical algorithm, eventually with a classical computer. As explained in the paper the second part of the paper might trigger even more the interest of a machine learning pratcitioner. But that’s for another post. :)</p>
<ol class="bibliography"><li><span id="kapoor2016quantum">Kapoor, A., Wiebe, N., & Svore, K. (2016). Quantum perceptron models. In <i>Advances in Neural Information Processing Systems</i> (pp. 3999–4007).</span></li>
<li><span id="brassard2002quantum">Brassard, G., Hoyer, P., Mosca, M., & Tapp, A. (2002). Quantum amplitude amplification and estimation. <i>Contemporary Mathematics</i>, <i>305</i>, 53–74.</span></li></ol>scinawaHere I explain the work of (Kapoor, Wiebe, & Svore, 2016). There, they basically applied aplitude amplification tecniques to two different version of the perceptron algorithm. With the first approach - that we describe in this post - the authors were able to gain a quadratic speedup w.r.t the number of elements in the training set. In the second approach, the authors leveraged the description of the perceptron in the so-called version space (the dual of the usual feature space descrption of the perceptron). This allowed them to gain a quadratic improvement w.r.t statistical efficiency: perhaps a more interesting gain than a quadratic speedup with the number of elements in the training set. We will see this the second model of quantum perceptron in another post, since the technique used is basically the same.Estimating average and variance of a function2018-08-12T00:00:00+02:002018-08-12T00:00:00+02:00https://luongo.pro/2018/08/12/Estimate_Average_Function<p>I decided to write this post after reading a paper called <a href="https://arxiv.org/abs/1806.06893">Quantum Risk Analysis</a> <a href="#woerner2018quantum">(Woerner & Egger, 2018)</a> , by Stefan Woerner and Daniel J. Egger. Here I want to describe just the main technique employed by their algorithm (namely, how to use amplitude estimation to get useful information out of a function). In another post I will add describe more in detail the rest of the paper, which goes into technical details on how to use these techniques for solving a problem related to financial analysts.</p>
<p>Suppose we have a random variable $X$ described by a certain probability distribution over $N$ different outcomes, and a function $f: \{0,\cdots N\} \to \{0,1\}$ defined over this distribution. How can we use quantum computers to evaluate some properties of $f$ such as expected value and variance faster than classical computers?</p>
<p>Let’s start by translating into the quantum realm these two mathematical bojects. The probability distribution is (surprise surprise) represented in our quantum computer by a quantum state over $n=\lceil \log N \rceil$ qubits.
<script type="math/tex">\ket{\psi} = \sum_{i=0}^{N-1} \sqrt{p_i} \ket{i}</script>
where the probability of measuring the state $\ket{i}$ is $p_i,$ for $p_i \in [0, 1]$. Basically, each bases of the Hilbert space represent an outcome of the random variable.</p>
<p>The quantization of the function $f$ is made by a linear operator $F$ acting on a new ancilla qubit as such:
<script type="math/tex">F: \ket{i}\ket{0} \to \ket{i}\left(\sqrt{1-f(i)}\ket{0} + \sqrt{f(i)}\ket{1}\right)</script></p>
<p>If we apply $F$ with $\ket{\psi}$ as input state we get:</p>
<script type="math/tex; mode=display">\sum_{i=0}^{N-1} \sqrt{1-f(i)}\sqrt{p_i}\ket{i}\ket{0} + \sum_{i=0}^{N-1} \sqrt{f(i)}\sqrt{p_i}\ket{i}\ket{1}</script>
<p>Observe that the probability of measuring $\ket{1}$ in the ancilla qubit is $\sum_{i=0}^{N-1}p_if(i)$, which is (w00t w00t) $E[f(X)]$.
By sampling the ancilla qubit we won’t get any speedup, but if we can now apply <a href="https://arxiv.org/abs/quant-ph/0005055">amplitude estimation</a> <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a> to the ancilla qubit on the right, we can get an estimate of $E[F(X)]$.</p>
<p>Finally, observe that:</p>
<ul>
<li>if we chose $f(i)=\frac{i}{N-1}$ we are able to estimate $E[\frac{X}{N-1}]$ (which, by knowing $N$ gives us an estimate of the expected value of $X$)</li>
<li>if we chose $f(i)=\frac{i^2}{(N-1)^2}$ instead, we can estimate $E[X^2]$ and using this along with the previous choice of $f$ we can estimate the variance of $X$: $E[X^2] - E[X]^2$.</li>
</ul>
<p>See ya!</p>
<ol class="bibliography"><li><span id="woerner2018quantum">Woerner, S., & Egger, D. J. (2018). Quantum Risk Analysis. <i>ArXiv Preprint ArXiv:1806.06893</i>.</span></li>
<li><span id="brassard2002quantum">Brassard, G., Hoyer, P., Mosca, M., & Tapp, A. (2002). Quantum amplitude amplification and estimation. <i>Contemporary Mathematics</i>, <i>305</i>, 53–74.</span></li></ol>scinawaI decided to write this post after reading a paper called Quantum Risk Analysis (Woerner & Egger, 2018) , by Stefan Woerner and Daniel J. Egger. Here I want to describe just the main technique employed by their algorithm (namely, how to use amplitude estimation to get useful information out of a function). In another post I will add describe more in detail the rest of the paper, which goes into technical details on how to use these techniques for solving a problem related to financial analysts.Selected articles on Quantum Machine Learning2018-07-19T00:00:00+02:002018-07-19T00:00:00+02:00https://luongo.pro/2018/07/19/scinawa-review-qml<p>This is a collection of paper I have found useful in the last years. It is far from complete and you are welcome to suggest new entries here that you think I have missed.
I don’t claim for completeness though.</p>
<h4 id="2018">2018</h4>
<ul>
<li>
<p><a href="https://arxiv.org/pdf/1807.03341.pdf">Troubling Trends in Machine Learning Scholarship</a> <code class="highlighter-rouge">#opinion-paper</code><br />
Is a self-autocritic of the ML community on the way they are doing science now. I think this might be relevant as well for the QML practicioner.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1804.10068.pdf">Quantum machine learning for data scientits</a> <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#tutorial</code>
This is a very nice review of some of the most known qml algorithms. I wish I had this when I started studying QML.</p>
</li>
<li>
<p><a href="">Image classification of MNIST dataset using quantum slow feature analysis</a> <code class="highlighter-rouge">#algo</code><br />
This is my first work in quantum machine learning. Here we show 2 new algorithms
The idea is to give evidence that QRAM based algorithms can obtain a speedup w.r.t classical algorithm in QML <em>on real data</em>.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1804.03719.pdf">Quantum algorithm implementations for beginners</a> <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#tutorial</code></p>
</li>
</ul>
<h4 id="2017">2017</h4>
<ul>
<li>
<p><a href="">Implementing a distance based classifier with a quantum interference circuit</a> <code class="highlighter-rouge">#algo</code></p>
</li>
<li>
<p><a href="">Quantum machine learning for quantum anomaly detection</a> <code class="highlighter-rouge">#algo</code><br />
Here the authors used previous technique to perform anomaly detection. Basically they project the data on the 1-dimensional subspace of the covariance matrix of the data. In this way anomalies are supposed to lie furhter away from the rest of the dataset.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1707.08561.pdf"> Quantum machine learning: a classical perspective</a>: <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#quantum learning theory</code></p>
</li>
</ul>
<h4 id="2016">2016</h4>
<ul>
<li>
<p><a href="">Quantum Discriminant Analysis for Dimensionality Reduction and Classification</a> <code class="highlighter-rouge">#algo</code><br />
Here the authors wrote two different algorithm, one for dimensionality reduction and the second for classification, with the same capabilities</p>
</li>
<li>
<p><a href="">Quantum Recommendation Systems</a> <code class="highlighter-rouge">#algo</code><br />
It is where you can learn about QRAM and quantum singular value estimation.</p>
</li>
</ul>
<h4 id="2015">2015</h4>
<ul>
<li>
<p><a href="https://arxiv.org/pdf/1512.02900.pdf">Advances in quantum machine learning</a> <code class="highlighter-rouge">#implementations</code>, <code class="highlighter-rouge">#review</code> <br />
It cover things up to 2015, so here you can find descriptions of Neural Networks, Bayesian Networks, HHL, PCA, Quantum Nearest Centroid, Quantum k-Nearest Neighbour, and others.</p>
</li>
<li>
<p><a href="">Quantum algorithms for topological and geometric analysis of data</a> <code class="highlighter-rouge">#algo</code></p>
</li>
</ul>
<h5 id="2014">2014</h5>
<ul>
<li>
<p><a href="">Quantum Algorithms for Nearest-Neighbor Methods for Supervised and Unsupervised Learning</a> <code class="highlighter-rouge">#tools</code>, <code class="highlighter-rouge">#algorithms</code><br />
This paper offer two approaches for calculating distances between vectors.
The idea for k-NN is to calculate distances between the test point and the training set in superposition and then use amplitude amplification tecniques to find the minimum, thus getting a quadratic speedup.</p>
</li>
<li>
<p><a href="">Quantum support vector machine for big data classification Patrick</a> <code class="highlighter-rouge">#algo</code><br />
This was one of the first example on how to use HHL-like algorithms in order to get something useful out of them.</p>
</li>
<li>
<p><a href="">Quantum self-testing</a> <code class="highlighter-rouge">#algo</code><br />
The authors discovered how partial application of the swap test are sufficient to transform a quantum state $\sigma$ into $U\sigma U^\dagger$ where $U=e^{-i\rho}$ given the ability to create multiples copies of $\rho$.
This work uses a particular access model of the data (sample complexity), which can be obtained from a QRAM</p>
</li>
</ul>
<h5 id="2013">2013</h5>
<ul>
<li><a href="https://arxiv.org/pdf/1307.0411.pdf">Quantum algorithms for supervised and unsupervised machine learning</a> <code class="highlighter-rouge">#algo</code><br />
This explain how to use swap test in order to calculate distances. Then it shows how this swap-test-for-distances can be used to do NearestCentroid and k-Means with adiabatic quantum computation</li>
</ul>
<h5 id="2009">2009</h5>
<ul>
<li><a href="">Quantum algorithms for linear systems of equations</a> <code class="highlighter-rouge">#algo</code><br />
This is the paper that started everything. :) Tecniques for sparse Hamiltonian simulation and phase estimation were applied in order to estimate the singular values of a matrix. Then a controleld rotation on ancilla qubit + postselection creates a state proportional to the solution of a system of equation. You can learn more about it <a href="HHL">here</a>.</li>
</ul>
<h3 id="code">Code</h3>
<ul>
<li><a href="http://grove-docs.readthedocs.io/en/latest/">Grove</a></li>
<li><a href="">Qiskit-acqua</a></li>
<li><a href="https://projectivesimulation.org">Projective Simulation</a></li>
</ul>scinawaThis is a collection of paper I have found useful in the last years. It is far from complete and you are welcome to suggest new entries here that you think I have missed. I don’t claim for completeness though.Quantum Frobenius Distance Classifier2018-07-18T00:00:00+02:002018-07-18T00:00:00+02:00https://luongo.pro/2018/07/18/Quantum-Frobenius-Distance-classifier<p>Yesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on <a href="https://arxiv.org/abs/1703.10793">distance based classification</a>, which is <a href="https://arxiv.org/abs/1803.00853">further analyzed here</a>. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta <a href="https://arxiv.org/abs/1805.08837">my paper</a>.</p>
<p>As usual, I assume data is stored in a QRAM. We are in the settings of supervised learning, so we have some labeled samples $x(i)$ in $\mathbb{R}^d$ for K different labels. Let $X_k$ be defined as the matrix whose rows are those vectors, and therefore have $K$ of those matrices.
$|T_k|$ is the number of elements in the cluster (so the number of rows in each matrix).</p>
<p>For a test point $x(0)$, define the matrix $ X(0) \in \mathbb{R}^{|T_k| x d} $
which just repeats the row $x(0)$ for $|T_k|$ times.
For $X(0)$, the number of rows is context dependent, but it hopefully be clear. Then, we define</p>
<script type="math/tex; mode=display">F_k( x(0)) = \frac{ ||X_k - X(0)||_F^2}{2 ( ||X_k||_F^2+ ||X(0)||_F^2) },</script>
<p>which corresponds to the average normalized squared distance between $x(0)$ and the cluster $k$.
Let $h : \mathcal{X} \to [K]$ our classification function. We assign to $x(0)$ a label according to the following rule:</p>
<script type="math/tex; mode=display">h(x(0)) := argmin_{k \in [K]} F_k( x(0))</script>
<p>We will estimate $F_k( x(0))$ efficiently using the algorithm below. From our QRAM construction we know we can create a superposition of all vectors in the cluster as quantum states, have access to their norms and to the total number of points and norm of the clusters. We define a normalization factor as:</p>
<script type="math/tex; mode=display">N_k= ||X_k||_F^2 + ||X(0)||_F^2 = ||X_k||_F^2 +|T_k| ||x(0)||^2.</script>
<h5 id="require">Require</h5>
<ul>
<li>QRAM access to the matrix $X_k$ of cluster $k$ and to a test vector $x(0)$. Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure">Ensure</h5>
<ul>
<li>An estimate $\overline{F_k (x(0))}$
such that $| F_k(x(0)) - \overline{F_k( x(0))} | < \eta $.</li>
</ul>
<h5 id="algorithm">Algorithm</h5>
<ul>
<li>Start with three empty quantum register. The first is an ancilla qubit, the second is for the index, and the third one is for the data.
<script type="math/tex">\ket{0}\ket{0}\ket{0}</script></li>
<li>$s:=0$</li>
<li>For $r=O(1/\eta^2)$
<ul>
<li>Create the state<br />
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \sqrt{|T_k|}||x(0)||\ket{0} +||X_k||_F \ket{1}\Big) \ket{0}\ket{0}</script></li>
<li>Apply to the first two register the unitary that maps:
<script type="math/tex">\ket{0}\ket{0} \mapsto \ket{0} \frac{1}{\sqrt{|T_k|}} \sum_{i \in T_k} \ket{i}\; \mbox{ and } \; \ket{1}\ket{0} \mapsto \ket{1} \frac{1}{||X_k||_F} \sum_{i \in T_k} ||x(i)|| \ket{i}</script>
This will get you to:
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} + \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i} \Big) \ket{0}</script></li>
<li>Now apply the unitary that maps
<script type="math/tex">\ket{0} \ket{i} \ket{0} \mapsto \ket{0} \ket{i} \ket{x(0)} \; \mbox{ and } \; \ket{1} \ket{i} \ket{0} \mapsto \ket{1} \ket{i} \ket{x(i)}</script></li>
</ul>
<p>to get the state
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} \ket{x(0)}+ \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></p>
<ul>
<li>Apply a Hadamard to the first register to get
<script type="math/tex">\frac{1}{\sqrt{2N_k}}\ket{0} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} + ||x(i)|| \ket{i}\ket{x(i)} \Big) +
\frac{1}{\sqrt{2N_k}}\ket{1} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} - ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></li>
<li>Measure the first register. If the outcome is $\ket{1}$ then $s:=s+1$</li>
</ul>
</li>
<li>Output $\frac{s}{r}$.</li>
</ul>
<p>Eventually, if you want to get a quadratic speedup w.r.t. $\eta$, perform amplitude estimation (with $O(1/\eta)$ iterations) on register $\ket{1}$ with the unitary implementing steps 1 to 4 to get an estimate $D$ within error $\eta$. This would make the circuit more complex, therefore less suitable for NISQ devices, but if you have enough qubits/fault tolerance, you can add it.</p>
<p>For the analysis, just note that the probability of measuring $\ket{1}$ is:</p>
<script type="math/tex; mode=display">\frac{1}{2N_k} \left ( |T_k|||x(0)||^2 + \sum_{i \in T_k} ||x(i)||^2 - 2\sum_{i \in T_k} \braket{x(0), x(i)} \right) = F_k(x(0)).</script>
<p>By Hoeffding bounds, to estimate $F_k(x(0))$ with error $\eta$ we would need $O(\frac{1}{\eta^2})$ samples.
For the running time, we assume all unitaries are efficient (i.e. we are capable of doing them in polylogarithmic time) either because the quantum states can be prepared directly by some quantum procedure or given that the classical vectors are stored in the QRAM, hence the algorithm runs in time $\tilde{O}(\frac{1}{\eta^2})$. We can of course use amplitude estimation and save a factor of $\eta$. Depending on the application one may prefer to keep the quantum part of the classifier as simple as possible or optimize the running time by performing amplitude estimation.</p>
<p>Given this estimator we can now define the QFD classifier.</p>
<h5 id="require-1">Require</h5>
<ul>
<li>QRAM access to $K$ matrices $X_k$ of elements of different classes.</li>
<li>A test vector $x(0)$.</li>
<li>Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure-1">Ensure</h5>
<ul>
<li>A label for $x(0)$.</li>
</ul>
<h5 id="algorithm-1">Algorithm</h5>
<ul>
<li>For $k \in [K]$
<ul>
<li>Use the QFD estimator to find $F_k(x(0))$ on $X_k$ and $x(0)$ with precision $\eta$.</li>
</ul>
</li>
<li>Output $h(x(0))=argmin_{k \in [K]} F_k( x(0))$.</li>
</ul>
<p>The running time of the classifier can be made $\tilde{O}(\frac{K}{\eta})$ when using amplitude amplification. That was it. QFDC basically exploit the subroutine for finding the average sqared distance between a point and a cluster and assign the test point to the “closest” cluster.</p>
<p>Drowbacks of this approach is that is very sentible to outliers. This is because we take the square of the distance of the points belonging to a cluster. This apparently can be mitigated by a proper dimensionality reduction algorithm, like <a href="QSFA">QSFA</a>.</p>scinawaYesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on distance based classification, which is further analyzed here. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta my paper.Iordanis Kerenidis’ talk on quantum machine learning2018-07-02T00:00:00+02:002018-07-02T00:00:00+02:00https://luongo.pro/2018/07/02/Iordanis-talk<p>This is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our <a href="https://arxiv.org/abs/1805.08837">recent results</a> on quantum slow feature analysis and classification of the MNSIT dataset.</p>
<p><a href="http://www.youtube.com/watch?v=KTVtMKo3g80" title="Quantum Algorithms for Classification"><img src="http://img.youtube.com/vi/KTVtMKo3g80/0.jpg" alt="Quantum Algorithms for Classification" /></a></p>scinawaThis is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our recent results on quantum slow feature analysis and classification of the MNSIT dataset.Quantum Slow Feature Analysis, a quantum algorithm for dimensionality reduction2018-06-16T00:00:00+02:002018-06-16T00:00:00+02:00https://luongo.pro/2018/06/16/quantum_slow_feature_analysis_a_quantum_algorithm_for_dimensionality_reduction<p>The original Slow Feature Analysis (SFA) was originally proposed to
learn slowly varying features from generic input signals that vary
rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999).
Computational neurologists observed long time ago that primary sensory
receptors, like the retinal receptors in an animal’s eye - are sensitive
to very small changes in the environment and thus vary on a very fast
time scale, the internal representation of the environment in the brain
varies on a much slower time scale. This observation is called <em>temporal
slowness principle</em>. SFA, being the state-of-the-art model for how this
temporal slowness principle is implemented, is an hypothesis for the
functional organization of the visual cortex (and possibly other sensory
areas of the brain). Said in a very practical way, we have some
“process” in our brain that behaves very similarly as dictated by SFA
(L. Wiskott et al. 2011).</p>
<p>Very beautifully, it is possible to show two reductions from two other
dimensionality reduction algorithms used in machine learning: Laplacian
Eigenmaps (a dimensionality reduction algorithm mostly suited for video
compressing) and Fisher Discriminant Analysis (a standard dimensionality
reduction algorithm). SFA can be applied in ML fruitfully, as there have
been many applications of the algorithm to solve ML related tasks. The
key concept for SFA (and LDA) is that he tries to project the data in
the subspace such that the distance between points with the same label
is minimized, while the distance between points with different label is
maximized.</p>
<h1 id="classical-sfa-for-classification">Classical SFA for classification</h1>
<p>The high level idea of using SFA for classification is the following:
One can think of the training set as an input series
$x(i) \in \mathbb{R}^d , i \in [n]$. Each $x(i)$ belongs to one of $K$
different classes. The goal is to learn $K-1$ functions
$g_j( x(i)), j \in [K-1]$ such that the output
$ y(i) = [g_1( x(i)), \cdots , g_{K-1}( x(i)) ]$ is very similar for
the training samples of the same class and largely different for samples
of different classes. Once these functions are learned, they are used to
map the training set in a low dimensional vector space. When a new data
point arrive, it is mapped to the same vector space, where
classification can be done with higher accuracy.</p>
<p>Now we introduce the minimization problem in its most general form as it
is commonly stated for classification (P. Berkes 2005). Let
$a=\sum_{k=1}^{K} \binom{|T_k|}{2}.$ For all $j \in [K-1]$, minimize:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Delta(y_j) = \frac{1}{a} \sum_{k=1}^K \sum_{s,t \in T_k \atop s<t} \left( g_j( x(s)) - g_j( x(t)) \right)^2 %]]></script>
<p>with the following constraints:</p>
<ol>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i\in T_k} g_j( x(i)) = 0 $</p>
</li>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))^2 = 1 $</p>
</li>
<li>
<p>$ \frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))g_v( x(i)) = 0 \quad \forall v < j $</p>
</li>
</ol>
<p>For some beautiful theoretical reasons, QSFA algorithm is in practice an
algorithm for fidning the solution of the <em>generalized eigenvalue
problem</em>:</p>
<script type="math/tex; mode=display">AW= \Lambda BW</script>
<p>Here $W$ is the matrix of the singular vectors, $\Lambda$ the diagonal matrix of singular values. For SFA $A$ and $B$ are defined as: $ A=\dot{X}^T \dot{X} $ and $B := X^TX$, where $\dot{X}$ is the matrix of the derivative of the data: i.e. for each possible elements with the same label we calculate the pointwise difference between vectors. (computationally, it suffice to sample $O(n)$ tuples fom the uniform distribution of all possible derivatives.</p>
<p>It is possible to see that the slow feature space we are looking for is is spanned by the eigenvectors of $W$ associated to the $K-1$ smallest eigenvalues of
$\Lambda$.</p>
<h1 id="quantum-sfa">Quantum SFA</h1>
<p>In (Kerenidis and Luongo 2018) we show how, using a “QuantumBLAS” ( i.e.
a set of quantum algorithm that we can use to perform linear algebraic
operations), we can perform the following algorithms. The intuition
behind this algorithm is that the derivative matrix of the data can be
pre-computed on non-whitened data, like one might do classically (and
spare a matrix multiplication). Since with quantum computer we don’t
have this problem, since we know how to perform matrix multiplication
efficiently. As in the classical algorithm, we have to do some
preprocessing to our data. For the quantum case, preprocessing consist
in:</p>
<ol>
<li>
<p>Polynomially expand the data with a polynomial of degree 2 or 3</p>
</li>
<li>
<p>Normalize and Scale the rows of the dataset $X$.</p>
</li>
<li>
<p>Create $\dot{X}$ by sampling from the distribution of possible
couples of rows of $X$ with the same label.</p>
</li>
<li>
<p>Create QRAM for $X$ and $\dot{X}$</p>
</li>
</ol>
<p>Note that all these operation are at most $O(nd\log(nd))$ in the size of
the training set, which is a time that we need to spend anyhow, even by
collecting the data classically.</p>
<p>To use our algorithm for classification, you use QSFA to bring one
cluster at the time, along with the new test point in the slow feature
space, and perform any distance based classification algorithm, like
QFDC or swap tests, and so on. The quantum algorithm is the following:</p>
<ul>
<li>
<p><strong>Require</strong> Matrices $X \in \mathbb{R}^{n \times d}$ and
$\dot{X} \in \mathbb{R}^{n \times d}$ in QRAM, parameters
$\epsilon, \theta,\delta,\eta >0$.\</p>
</li>
<li>
<p><strong>Ensure</strong> A state $\ket{\bar{Y}}$ such that
$ | \ket{Y} - \ket{\bar{Y}} | \leq \epsilon$, with
<script type="math/tex">Y = A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z</script></p>
</li>
</ul>
<ol>
<li>
<p>Create the state
<script type="math/tex">\ket{X} := \frac{1}{ {||X ||}_F} \sum_{i=1}^{n} {||x(i) ||} \ket{i}\ket{x(i)}</script>
using the QRAM that stores the dataset.</p>
</li>
<li>
<p>(Whitening algorithm) Map $\ket{X}$ to $\ket{\bar{Z}}$ with
$| \ket{\bar{Z}} - \ket{Z} | \leq \epsilon $ and $Z=XB^{-1/2}.$
using quantum access to the QRAM.</p>
</li>
<li>
<p>(Projection in slow feature space) Project $\ket{\bar{Z}}$ onto the
slow eigenspace of $A$ using threshold $\theta$ and precision
$\delta$ (i.e.
<script type="math/tex">A^+_{\leq \theta, \delta}A_{\leq \theta, \delta}\bar{Z}</script> )</p>
</li>
<li>
<p>Perform amplitude amplification and estimation on the register
$\ket{0}$ with the unitary $U$ implementing steps 1 to 3, to obtain
$\ket{\bar{Y}}$ with $| \ket{\bar{Y}} - \ket{Y} | \leq \epsilon $
and an estimator $ \bar{ {|| Y ||} } $ with multiplicative error
$\eta$.</p>
</li>
</ol>
<p>Overall, the algorithm is subsumed in the following Theorem.</p>
<p>Let $X = \sum_i \sigma_i u_iv_i^T \in \mathbb{R}^{n\times d}$ and its
derivative matrix $\dot{X} \in \mathbb{R}^{n \log n \times d}$ stored in
QRAM. Let $\epsilon, \theta, \delta, \eta >0$. There exists a quantum
algorithm that produces as output a state <script type="math/tex">\ket{\bar{Y}}</script> with
<script type="math/tex">| \ket{\bar{Y}} - \ket{A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z} | \leq \epsilon</script>
in time
<script type="math/tex">\tilde{O}\left( \left( \kappa(X)\mu(X)\log (1/\varepsilon) + \frac{ ( \mu({X})+ \mu(\dot{X}) ) }{\delta\theta} \right)
\frac{||{Z}||}{ ||A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} {Z} ||} \right)</script>
and an estimator $\bar{||Y ||}$ with
$ | \bar{||Y ||} - ||Y || | \leq \eta {||Y ||}$ with an additional
<script type="math/tex">1/\eta</script> factor.</p>
<p>A prominent advantage of SFA compared to other algorithms is that <em>it
is almost hyperparameter-free</em>. The only parameters to chose are in the
preprocessing of the data, e.g. the initial PCA dimension and the
nonlinear expansion that consists of a choice of a polynomial of
(usually low) degree $p$. Another advantage is that it is <em>guaranteed to
find the optimal solution</em> within the considered function space
(Escalante-B and Wiskott 2012). We made an experiment, and using QSFA with a quantum classifier, we were
able to reach 98.5% accuracy in doing digit recognition: we were able to
read 98.5% among 10.000 images of digits given a training set of 60.000
digits.</p>
<h3 id="references">References</h3>
<div id="refs" class="references">
<div id="ref-Berkes2005pattern">
Berkes, Pietro. 2005. “Pattern Recognition with Slow Feature Analysis.”
*Cognitive Sciences EPrint Archive (CogPrints)* 4104.
[http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/\~berkes](http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/~berkes).
</div>
<div id="ref-escalante2012slow">
Escalante-B, Alberto N, and Laurenz Wiskott. 2012. “Slow Feature
Analysis: Perspectives for Technical Applications of a Versatile
Learning Algorithm.” *KI-Künstliche Intelligenz* 26 (4). Springer:
341–48.
</div>
<div id="ref-jkereLuongo2018">
Kerenidis, Iordanis, and Alessandro Luongo. 2018. “Quantum
Classification of the Mnist Dataset via Slow Feature Analysis.” *arXiv
Preprint arXiv:1805.08837*.
</div>
<div id="ref-scholarpedia2017SFA">
Wiskott, L., P. Berkes, M. Franzius, H. Sprekeler, and N. Wilbert. 2011.
“Slow Feature Analysis.” *Scholarpedia* 6 (4): 5282.
doi:[10.4249/scholarpedia.5282](https://doi.org/10.4249/scholarpedia.5282).
</div>
<div id="ref-wiskott1999learning">
Wiskott Laurenz, and Laurenz Wiskott. 1999. “Learning invariance
manifolds.” *Neurocomputing* 26-27. Elsevier: 925–32.
doi:[10.1016/S0925-2312(99)00011-9](https://doi.org/10.1016/S0925-2312(99)00011-9).
</div>
</div>["scinawa"]The original Slow Feature Analysis (SFA) was originally proposed to learn slowly varying features from generic input signals that vary rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999). Computational neurologists observed long time ago that primary sensory receptors, like the retinal receptors in an animal’s eye - are sensitive to very small changes in the environment and thus vary on a very fast time scale, the internal representation of the environment in the brain varies on a much slower time scale. This observation is called temporal slowness principle. SFA, being the state-of-the-art model for how this temporal slowness principle is implemented, is an hypothesis for the functional organization of the visual cortex (and possibly other sensory areas of the brain). Said in a very practical way, we have some “process” in our brain that behaves very similarly as dictated by SFA (L. Wiskott et al. 2011).How to evaluate a classifier2018-06-10T00:00:00+02:002018-06-10T00:00:00+02:00https://luongo.pro/2018/06/10/evaluate_classifier<p>Practitioners in quantum machine learning should not only build their
skills in quantum algorithms, and having some basic notions of
statistics and data science won’t hurt. In the following the see some
ways to evaluate a classifier. What does it means in practice? Imagine
you have a medical test that is able to tell if a patient is sick or
not. You might want to consider the behavior of your classier with
respect to the following parameters: the cost of identifying a sick
patient as healthy is high, and the cost of identifying a healthy
patient as sick. For example, if the patient is a zombie and it
contaminates all the rest of the humanity you want to minimize the
occurrences of the first case, while if the cure for “zombiness” is
lethal for a human patient, you want to minimize the occurrences of the
second case. With P and N we count the number of patients tested
Positively or Negatively. This is formalized in the following
definitions, which consists in statistics to be calculated on the test
set of a data analysis.</p>
<ul>
<li>
<p><strong>TP True positives (statistical power)</strong> : are those labeled as
sick that are actually sick.</p>
</li>
<li>
<p><strong>FP False positives (type I error)</strong>: are those labeled as sick but
that actually are healthy</p>
</li>
<li>
<p><strong>FN False negatives (type II error)</strong> : are those labeled as
healthy but that are actually sick.</p>
</li>
<li>
<p><strong>TN True negative</strong>: are those labeled as healthy that are healthy.</p>
</li>
</ul>
<p>Given this simple intuition, we can take a binary classifier and imagine
to do an experiment over a data set. Then we can measure:</p>
<ul>
<li>
<p><strong>True Positive Rate (TPR) = Recall = Sensitivity</strong>: is the ratio of
correctly identified elements among all the elements identified as
sick. It answer the question: “how are we good at detecting sick
people?”.
<script type="math/tex">\frac{ TP }{ TP + FN} + \frac{TP }{P} \simeq P(test=1|sick=1)</script>
This is an estimator of the probability of a positive test given a
sick individual.</p>
</li>
<li>
<p><strong>True Negative Rate (TNR) = Specificity</strong> is a measure that tells
you how many are labeled as healthy but that are actually sick.
<script type="math/tex">\frac{ TN }{ TN + FP} = p(test = 0 | sick =0)</script> How many
healthy patients will test negatively to the test? How are we good
at avoiding false alarms?</p>
</li>
<li>
<p><strong>False Positive Rate = Fallout</strong>
<script type="math/tex">FPR = \frac{ FP }{ FP + TN } = 1 - TNR</script></p>
</li>
<li>
<p><strong>False Negative Rate = Miss Rate</strong>
<script type="math/tex">FNR = \frac{ FN }{ FN + TP } = 1 - TPR</script></p>
</li>
<li>
<p><strong>Precision, Positive Predictive Value (PPV)</strong>:
<script type="math/tex">\frac{ TP }{ TP + FP} \simeq p(sick=1 | positive=1)</script> How many
positive to the test are actually sick?</p>
</li>
<li>
<p><strong>$F_1$ score</strong> is a more compressed index of performance which is a
possible measure of performance of a binary classifier. Is simply
the harmonic mean of Precision and Sensitivity:
<script type="math/tex">F_1 = 2\frac{Precision \times Sensitivity }{Precision + Sensitivity }</script></p>
</li>
<li>
<p><strong>Receiver Operating Characteristic (ROC)</strong> Evaluate the TRP and FPR
at all the scores returned by a classifier by changing a parameter.
It is a plot of the true positive rate against the false positive
rate for the different possible value (cutpoints) of a test or
experiment.</p>
</li>
<li>
<p>The <strong>confusion matrix</strong> generalize these 4 combination of (TP TN FP
FN) to multiple classes: is a $l \times l$ where at row $i$ and
column $j$ you have the number of elements from the class$i$ that
have been classified as elements of class $j$.</p>
</li>
</ul>
<p>Bref. This post because I always forgot about these terms and I wasn’t
able to find them described in a concise way with the same formalism
without googling more time than that I spent writing this post. Other
links:
<a href="https://uberpython.wordpress.com/2012/01/01/precision-recall-sensitivity-and-specificity/">here</a></p>["Alessandro Luongo"]Practitioners in quantum machine learning should not only build their skills in quantum algorithms, and having some basic notions of statistics and data science won’t hurt. In the following the see some ways to evaluate a classifier. What does it means in practice? Imagine you have a medical test that is able to tell if a patient is sick or not. You might want to consider the behavior of your classier with respect to the following parameters: the cost of identifying a sick patient as healthy is high, and the cost of identifying a healthy patient as sick. For example, if the patient is a zombie and it contaminates all the rest of the humanity you want to minimize the occurrences of the first case, while if the cure for “zombiness” is lethal for a human patient, you want to minimize the occurrences of the second case. With P and N we count the number of patients tested Positively or Negatively. This is formalized in the following definitions, which consists in statistics to be calculated on the test set of a data analysis.qramutils: gather statistics for your QRAM2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Gather-statistics-for-your-QRAM<p>Generally, with the term QRAM people are referring to an oracle, or
generically to a unitary, that gets called with the purpose of creating
a state in a quantum circuit. This state represents some (classical)
data that you want to process later in your algorithm. More formally,
QRAM allows you to perform operations like:
$\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some
$i \in [n]$. This model can be used to create states proportional to
classical vectors, and allowing us to perform queries:
$\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for
some $i \in [n]$</p>
<p>Querying the QRAM is assumed to be done efficiently. The running time is
expected to be polylogarithmic in the matrix dimensions, but eventually
the time complexity might polynomial in other parameters. As an example,
in QRAM described in Kerenidis and Prakash (2017)Kerenidis and Prakash
(2016)Prakash (2014) the authors stores a matrix decomposition such that
the running time of a query might depend on the Frobenius norm, or a
parametrized function, which is specific to their implementation. In
this model, the best parametrization of the decomposition might depend
on the dataset. This means that in practice, you might need to estimate
these parameters, and therefore I’ve decided to write a library for
this. Specifically, given a matrix $A$ to store in QRAM, you have to
find the value $p \in \left(0, 1 \right)$ such that it minimize the
function: <script type="math/tex">\mu_p(A) = \sqrt{ s_{2p}(A) s_{2(1-p)}(A^T)}</script> where
$s_p(A) := max_{i \in [m]} |A|_F^p $ is the maximum $l_p$ norm to the
power of $p$ of the row vectors.</p>
<p>Being able to estimate parameters of a dataset might happen also with
other model of access to the data. For instance, other algorithms such
HHL uses Hamiltonian simulation, which has an access model that makes
the complexity of the algorithm depend on the sparsity.</p>
<p>So far qramutils analyze a given numpy matrix for the following
parameters:</p>
<ul>
<li>
<p>The sparsity.</p>
</li>
<li>
<p>The conditioning number.</p>
</li>
<li>
<p>The Frobenius norm (of the rescaled matrix such that
$0< \sigma_i < 1$).</p>
</li>
<li>
<p>The best parameter $p$ for the matrix decomposition described above.</p>
</li>
<li>
<p>Some boring and common plotting.</p>
</li>
</ul>
<p><a href="https://github.com/Scinawa/qramutils">Here</a> you can find the
repository.</p>
<p>This code might be improved in many directions! For instance, I’d like
to integrate in the library the code for plotting the parameters for
various PCA dimensions and/or degree of polynomial expansion, integrate
options for dataset normalization, scaling, and maybe expand the type of
accepted input data, and so on..</p>
<p>Ideally, for other kind of matrices there hopefully might be other kind
matrix decompositions available and therefore there might be the need to
estimate other parameters in the future. This is where I’ll add that
code for that. :)</p>
<p>This is an example of usage on the MNIST dataset:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ pipenv run python3 examples/mnist_QRAM.py --help
usage: mnist_QRAM.py [-h] [--db DB] [--generateplot] [--analize]
[--pca-dim PCADIM] [--polyexp POLYEXP]
[--loglevel {DEBUG,INFO}]
Analyze a dataset and model QRAM parameters
optional arguments:
-h, --help show this help message and exit
--db DB path of the mnist database
--generateplot run experiment with various dimension
--analize Run all the analysis of the matrix
--pca-dim PCADIM pca dimension
--polyexp POLYEXP degree of polynomial expansion
--loglevel {DEBUG,INFO}
set log level
</code></pre>
</div>
<p>This is the output, assuming you have a folder called data that holds
the MNIST dataset.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 examples/mnist_QRAM.py --db data --analize --loglevel INFO
04-01 22:23 INFO Calculating parameters for default configuration: PCA dim 39, polyexp 2
04-01 22:24 INFO Matrix dimension (60000, 819)
04-01 22:24 INFO Sparsity (0=dense 1=empty): 0.0
04-01 22:24 INFO The Frobenius norm: 4.6413604982930385
04-01 22:26 INFO best p 0.8501000000000001
04-01 22:26 INFO Best p value: 0.8501000000000001
04-01 22:26 INFO The \mu value is: 4.6413604982930385
04-01 22:26 INFO Qubits needed to index+data register: 26.
</code></pre>
</div>
<p>If you want to use the library in your source code:</p>
<div class="highlighter-rouge"><pre class="highlight"><code> libq = qramutils.QramUtils(X, logging_handler=logging)
logging.info("Matrix dimension {}".format(X.shape))
sparsity = libq.sparsity()
logging.info("Sparsity (0=dense 1=empty): {}".format(sparsity))
frob_norm = libq.frobenius()
logging.info("The Frobenius norm: {}".format(frob_norm))
best_p, min_sqrt_p = libq.find_p()
logging.info("Best p value: {}".format(best_p))
logging.info("The \\mu value is: {}".format(min(frob_norm, min_sqrt_p)))
qubits_used = libq.find_qubits()
logging.info("Qubits needed to index+data register: {} ".format(qubits_used))
</code></pre>
</div>
<p>To install, you just need to do the following:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 setup.py sdist
</code></pre>
</div>
<p>And then, your package will be ready to be installed as:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv install dist/qramutils-0.1.0.tar.gz
</code></pre>
</div>
<div id="refs" class="references">
<div id="ref-kerenidis2016quantum">
Kerenidis, Iordanis, and Anupam Prakash. 2016. “Quantum Recommendation
Systems.” *ArXiv Preprint ArXiv:1603.08675*.
</div>
<div id="ref-kerenidis2017quantum">
———. 2017. “Quantum Gradient Descent for Linear Systems and Least
Squares.” *ArXiv Preprint ArXiv:1704.04992*.
</div>
<div id="ref-prakash2014quantum">
Prakash, Anupam. 2014. *Quantum Algorithms for Linear Algebra and
Machine Learning*. University of California, Berkeley.
</div>
</div>scinawaGenerally, with the term QRAM people are referring to an oracle, or generically to a unitary, that gets called with the purpose of creating a state in a quantum circuit. This state represents some (classical) data that you want to process later in your algorithm. More formally, QRAM allows you to perform operations like: $\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some $i \in [n]$. This model can be used to create states proportional to classical vectors, and allowing us to perform queries: $\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for some $i \in [n]$Failed Attempt To Reverse Swap Test2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Failed-attempt-to-reverse-swap-test<p>This post has born from an attempt of finding a reversible circuit for
computing the swap test: a circuit used to compute the inner product of
two quantum states. This circuit was originally proposed for solving the
state distinguishably problem, but as you can imagine is very used in
quantum machine learning too. Before starting, let’s note one thing. A
reversible circuit for the swap test implies that we are able to
recreate the two input states. Conceptually, this should be impossible,
because of the no cloning theorem. With a very neat observation we can
realize that we are not even able to preserve one of the states.</p>
<p>There is no unitary operator $U\ket{x}\ket{y}$ that allows you to
estimate the scalar product between two states $x,y$ as $\braket{x|y}$
using only one copy of $\ket{x}$.</p>
<p>By absurd. Assume this unitary exists. Than it would be possible to
estimate the scalar product between $\ket{x}$ and all the base states
$\ket{i}$. (basically doing tomography for the state). This is a way of
recover classically the state of $\ket{x}$. By knowing $\ket{x}$, we
could recreate as many copies as we want of $\ket{x}$. Therefore, we
could use this procedure to clone a state. This is prevented by the
no-cloning theorem.</p>
<p>Let’s see what happens if we try to reverse it.</p>
<p><img src="/assets/reverse_swap.png" alt="image" /></p>
<p>It is good to know that the circuit in Figure [conservative] is
inspired by the proof $BPP \subseteq BQP$. The idea is the following: if
after a swap test, and before doing any measurement on the ancilla
qubit, we do a CNOT on a second ancillary qubit, and then execute the
inverse of the swap test. Being the swap test self-inverse operator, it
simply means that we apply the swap test twice. Let’s start the
calculations from the CNOT on the second ancilla qubit.</p>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{00} + \left( \ket{ab} - \ket{ba} \right)\ket{11} \Big] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{+0} + \left( \ket{ab} - \ket{ba} \right)\ket{-1} \Big] \xrightarrow{\text{SWAP}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} + \ket{ba} \Big) \ket{0} + \Big( \ket{ab} + \ket{ba} \Big) \ket{1} \Big] \ket{0} + \frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} - \ket{ba} \Big) \ket{0} - \Big( \ket{ba} - \ket{ab} \Big) \ket{1} \Big] \ket{1}
\right] =</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{+} \right] \ket{0} +
\left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{+} \right] \ket{1}
\right] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{0} \right] \ket{0} +
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{0} \right] \ket{1}
\right].</script>
<p><script type="math/tex">p(\ket{0}) = \frac{1}{4}\Big( 2 + 2 |\braket{ab|ba}|\Big) = \frac{ 1+ \braket{ab|ba}}{2} = \frac{ 1+ |\braket{a|b}|^2}{2}</script>
And therefore $p(\ket{1})$ is $\frac{ 1- |\braket{a|b}|^2}{2}$ as in the
original swap test. So, the result is the same, but as in the original
swap test, the register are pretty entangled, therefore we haven’t
reversed our swap.</p>
<p>Here I have applied the rules:</p>
<ul>
<li>
<p>$ (A\otimes B)^{\dagger} = A^{\dagger} \otimes B^{\dagger} $</p>
</li>
<li>
<p>$ \left( \bra{\phi} \otimes \bra{\psi} \right) \left( \ket{\phi} \otimes \ket{\psi} \right) = \braket{\psi, \psi} \otimes \braket{\phi, \phi}$</p>
</li>
</ul>
<p>You may have noted that this circuit is very similar to circuit that you
obtain if you perform amplitude amplification Brassard et al. (2000) on
the swap test. The swap circuit is the algorithm $A$ that produces
states with a certain probability distribution, and the CNOT is the
unitary $U_f$ that is able to recognize the “good” states from bad
states. By setting the second ancilla qubit to $\ket{+}$ we would be
able to write on the phase of our state some useful information to
recover with a QFT later on. That’s very cool, since amplitude
amplification allows us to decrease quadratically the computational
complexity of the algorithm with respect to the error in the estimation
of the amplitude of the ancilla qubit.</p>
<div id="refs" class="references">
<div id="ref-brassard2002quantum">
Brassard, Gilles, Peter Høyer, Michele Mosca, and Alain Tapp. 2000.
“Quantum Amplitude Amplification and Estimation.” *ArXiv Preprint
Quant-Ph/0005055*.
</div>
</div>scinawaThis post has born from an attempt of finding a reversible circuit for computing the swap test: a circuit used to compute the inner product of two quantum states. This circuit was originally proposed for solving the state distinguishably problem, but as you can imagine is very used in quantum machine learning too. Before starting, let’s note one thing. A reversible circuit for the swap test implies that we are able to recreate the two input states. Conceptually, this should be impossible, because of the no cloning theorem. With a very neat observation we can realize that we are not even able to preserve one of the states.