Jekyll2018-11-14T15:01:55+01:00https://luongo.pro/returnlambdaThe level of achievement that you have in anything, is a reflection of how well you were able to focus on it (Steve Vai)
A Quantum Perceptron model2018-08-15T00:00:00+02:002018-08-15T00:00:00+02:00https://luongo.pro/2018/08/15/Quantum-Perceptron<p>Here I explain the work of <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a>. There, they basically applied aplitude amplification tecniques to two different version of the perceptron algorithm. With the first approach - that we describe in this post - the authors were able to gain a quadratic speedup w.r.t the number of elements in the training set. In the second approach, the authors leveraged the description of the perceptron in the so-called <em>version space</em> (the dual of the usual feature space descrption of the perceptron). This allowed them to gain a quadratic improvement w.r.t statistical efficiency: perhaps a more interesting gain than a quadratic speedup with the number of elements in the training set. We will see this the second model of quantum perceptron in another post, since the technique used is basically the same.</p>
<h4 id="the-perceptron">The perceptron</h4>
<p>Let’s briefly introduce the perceptron. We are given a training set $\mathbb{T} = \{ \phi_1 … \phi_N\} $, $\phi_i \in \mathbb{R}^D$ of labeled vectors that belongs to two different classes: $y_i \in \{0,1\}$. For the sake of simplicity, we assume those vectors to be linearly separable. While in practice this is not always the case, it there are statistical guarantees that we still will be able to lern a good-enough separating hyperplane which approximate the best separating hyperplane $w^*$. A (classical) perceptron find an hyperplane $w$ that separates the data of the two classes. More formally, we want to find a $w$ such that $w^T x_i * y_i \leq 0 \quad \forall i \in \left[N\right]$ is true. The intuition for the classical algorithm is the following: start by an initial guess for $w$, and then update you guess by adding to the vector describing the hyperplane $w$ the misclassified vector, and then normalize to keep the norm of $w$ constant. In this way you rotate your current guess of $w$ until it correctly classify the training vector.</p>
<p>We say that the two classes are separated by a <em>margin</em> of $\delta$. Recall that the margin is defined (“a priori” w.r.t the dataset) as:</p>
<script type="math/tex; mode=display">\gamma = min_{i\in T} \frac{|x_i.w^*|}{||x||}</script>
<p>We can think of the margin as a measure of the training set which tells us how much to rotate $w$ each time to change the label of a misclassified vector. For the record, it is possible to prove that the perceptron makes at most $\frac{1}{\gamma^2}$ mistakes for points $\phi_i$ that are separated with angular margin $\gamma$.</p>
<p>In the quantum version of the algorithm we want to perform amplitude amplification to the perceptron, so the idea is to use amplitude amplification <strong>find quicker</strong> the misclassified vectors in the training set.</p>
<p>A quick recap on Grover-like algorithms. In order to apply amplitude amplification to a problem you need build two unitary operators that combined gives you the Grover iterate:</p>
<script type="math/tex; mode=display">U_{Grover}=U_{init}U_{targ}</script>
<p>where $U_{init} = 2\ket{\psi}\bra{\psi}$ is the reflection about the mean, and <script type="math/tex">U_{targ} = \mathbb{I} - 2P</script> is the change of phase of the “good” solution you are targeting in your problem.</p>
<p>By applying for a certain number of times the Grover iterate to the quantum state generated by querying an oracle (in tihs case our quantum memory), we can teak the probability of sampling a misclassified vector from your quantum computer.</p>
<p>More formally, this is the statement of the theorem:</p>
<h6 id="theorem-amplitude-amplification-brassard-hoyer-mosca--tapp-2002">Theorem: Amplitude amplification <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a></h6>
<p><em>Let $A$ be any quantum algorithm that uses no measurements, and let $f : \{0,1 \}^n \to \{0, 1\}$ be any Boolean function. There exists a quantum algorithm that given the initial success probability $a > 0$ of $A$, finds a good solution with certainty using a number of applications of $A$ and $A^{-1}$ which is in $\Theta(1/\sqrt{a})$ in the worst case.</em></p>
<p>In this post we see how to build the quantum circuit for $A$ and for the boolean function $f$ to suits our needs of quantizing a perceptron.</p>
<h4 id="a-different-classical-perceptron">A different classical perceptron</h4>
<p>An underlying assumption that we do in the classical analysis of the algorithm is that we have sequential access to the training set, (like an array). In the quantum algorithm we will drop this assumption, and instead assume that we have access to random samples of the dataset. As you might have imagined, we will query the elements of the training set in superposition, at the cost of introducing the possibility of extracting the same training element multiple times.</p>
<p>Recall that classically, the cost of training a perceptron in the original array-model is:</p>
<script type="math/tex; mode=display">O\left(\frac{N}{\gamma^2}\right)</script>
<p>However, as is shown in the paper, if we are allowed to sample the vectors from the training set the running time of a classical lerner stretches by a logarithmic factor (for reason related to the coupon collector problem) to:</p>
<script type="math/tex; mode=display">O \left(\frac{N}{\gamma^2}\log \left(\frac{1}{\varepsilon\gamma^2}\right) \right)</script>
<p>Obviously, this is proven duly in the paper. <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a></p>
<h4 id="quantum-perceptron">Quantum perceptron</h4>
<p>Let’s see how to speed things up with quantum. If you are already accustomed to quantum computation, perhaps I already gave you a hint on the quantum state we are going to create for our algorithm. We assume to have access to the following oracles:</p>
<script type="math/tex; mode=display">U\ket{j}\ket{0}\to\ket{j}\ket{\phi_j}</script>
<p>And its inverse $U^{\dagger}$. With $U$ (by linearity) We are going to build a uniform superposition of the elements of the training set:</p>
<script type="math/tex; mode=display">U\frac{1}{\sqrt{N}}\sum_{j=0}^N\ket{j}\ket{0} \to \frac{1}{\sqrt{N}}\sum_{j=0}^N\ket{j}\ket{\phi_i}</script>
<p>What does $\ket{\phi_i}$
means in practice? If we have to store a floating point vector $\phi^i \in \mathbb{R}^d$, we can store the m-bit binary representation of the $d$ floating point numbers, and add one qubit to store the label of the vector $y^i$ (map $-1$ to 0 for a negative labels). The authors note that you can interpret this qubit-string as an unsigned integer. They also note that in this way we map each element in the training set to a basis of our Hilber space.</p>
<p>Now that we have our data loaded in our quantum computer, we craft the unitary operator that allows us to test if the perceptron correctly assign a training vector. As in the classical algorithm, we start by a random guess of the weight vector $w_0$. Each time we find a misclassified vector we update our model by adjusting $w_t$, our current guess for $w_*$ as in the classical algorithm.
<br /></p>
<p>Said simply, we need a quantum circuit implementing the perceptron algorithm for a given weight $w$, and “plug it” into amplitude amplification theorem. <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a>. The unitary operator that want to implement to apply amplitude amplification just change the sign of misclassified vectors. We can therefore write it as such:</p>
<script type="math/tex; mode=display">\mathbb{F}_w \Phi_j = (-1)^{f_w(\phi_j,y_j)}\Phi_j</script>
<p>Let me explain this. We define $f_w(\phi, y)$ to be the boolean function of the perceptron function, that given a weight vector $w$ and the class of $\phi$ tells $0$ if the vector is currently well classified according to the label $y$, and return $+1$ if the vector is misclassified.
This will allow us to change the sign of just the misclassified vectors. The unitary implementation of an oracle like this would can be plugged into the circuit for amplitude amplification and gives us an algorithm to do a quantum perceptron. As it is known, we can easily build the quantum circuit from a classical boolean circuit. Therefore we can assume to have the quantum circuit perceptron algorithm (for a given model $w$). We want this quantum circuit computes the following mapping:</p>
<script type="math/tex; mode=display">F_w[j \otimes \phi_0] = (-1)^{f_w(\phi_j, y_j)}[j \otimes \Phi_0]</script>
<p>The unitary operator that we need to implement in order to get a quantum version of $F_w$ can be built in the following way:</p>
<script type="math/tex; mode=display">U_{targ} = F_w = U^\dagger (\mathbb{I} \otimes \mathbb{F}_w ) U</script>
<p>This represent the first part of the ingredients that we need for amplitude amplification. The second part consist in $U_{init}$, which in this case is $U_\text{init}=2\ket{\psi}\bra{\psi} - I$, with $\ket{\psi}=\frac{1}{\sqrt{N}}\sum_{j=1}^N = \ket{j}$</p>
<p>The grover iterate is defined $G=U_{init}U_{targ}$. This is the circuit we need to apply amplitude amplification to a problem. If you don’t believe me, you should check the detailed proof of the paper :)</p>
<p>The main result of this section is the following theorem:</p>
<h5 id="theorem-1-kapoor-wiebe--svore-2016">Theorem 1 <a href="#kapoor2016quantum">(Kapoor, Wiebe, & Svore, 2016)</a></h5>
<p><em>Given a training set that consists of unit vectors $\phi_0, … ,\phi_N$ that are separated by a margin of $\gamma$ in feature space, the number of applications of $F_w$ needed to infer a perceptron model w, such that $P(\exists j : f_w(\psi_j) = 1) \leq \epsilon$ using a quantum computer is $N_\text{quant}$ where:</em></p>
<p><script type="math/tex">\omega \left( \sqrt{N}\right) \ni N_\text{quant} \in O \left( \frac{\sqrt{N}}{\gamma^2} log \left[ \frac{1}{\epsilon \gamma^2} \right] \right)</script>.</p>
<p><em>The number of queries to $f_w$ needed in the classical setting, $N_\text{class}$, where the training vectors are found by sampling uniformly from the training data is bounded by:</em></p>
<p><script type="math/tex">\omega \left( N\right) \ni N_\text{class} \in O \left( \frac{N}{\gamma^2} log \left[ \frac{1}{\epsilon \gamma^2} \right] \right)</script>.</p>
<h4 id="the-algorithm">The algorithm</h4>
<h6 id="require">Require:</h6>
<ol>
<li>Access to oracle $U$ storing $N$ input string</li>
<li>Error parameter $\epsilon$.</li>
</ol>
<h6 id="ensure">Ensure:</h6>
<ol>
<li>An hyperplane approximaitng $w^*$</li>
</ol>
<ul>
<li>Create random vector $w$</li>
<li><em>For</em> $k=1 … \lceil \log_{3/4} (\gamma^2\epsilon) \rceil$
<ul>
<li><em>For</em> $j = 1 … \lceil \log_c (1/\sin(2sin^{-1}(1\sqrt{N)})) \rceil $
<ul>
<li>Sample uniformly in integer $m \in [0…\lceil c^j \rceil ]$</li>
<li>Prepare query register $\ket{\psi}=\sum_{i=1}^N\ket{i}\ket{0}$</li>
<li>Perform $Q^m\ket{\psi}$</li>
<li>Measure the first index register and get $\to i$.</li>
<li>If $f_{w_t}(\phi_i, y_i) =1$ then update $F_{w_t}$</li>
</ul>
</li>
</ul>
</li>
<li>Return $w_t$ to the user.</li>
</ul>
<p>We show how to apply amplitude amplification on the dataset to find all the misclassified vectors with a quantum version of the perceptron circuit. At each iteration, we update our circuit $F_{w_t}$ with the new model of the perceptron and we contrinue untill no misclassified vectors are left in the trainingset.
A couple of sentences on the algorithm. The two loop in the algorithm assure we to be able to find the correct number of times to apply $G$ with an exponential search among the space of parameters. Its is basically a trick to preserve the quadratic speedup without knowing in advance the right number of misclassified vectors for a given perceptron. Anyway, is a trick described properly in the paper and in <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a></p>
<p>Now the user can take the model $w$ and use in its classical algorithm, eventually with a classical computer. As explained in the paper the second part of the paper might trigger even more the interest of a machine learning pratcitioner. But that’s for another post. :)</p>
<ol class="bibliography"><li><span id="kapoor2016quantum">Kapoor, A., Wiebe, N., & Svore, K. (2016). Quantum perceptron models. In <i>Advances in Neural Information Processing Systems</i> (pp. 3999–4007).</span></li>
<li><span id="brassard2002quantum">Brassard, G., Hoyer, P., Mosca, M., & Tapp, A. (2002). Quantum amplitude amplification and estimation. <i>Contemporary Mathematics</i>, <i>305</i>, 53–74.</span></li></ol>scinawaHere I explain the work of (Kapoor, Wiebe, & Svore, 2016). There, they basically applied aplitude amplification tecniques to two different version of the perceptron algorithm. With the first approach - that we describe in this post - the authors were able to gain a quadratic speedup w.r.t the number of elements in the training set. In the second approach, the authors leveraged the description of the perceptron in the so-called version space (the dual of the usual feature space descrption of the perceptron). This allowed them to gain a quadratic improvement w.r.t statistical efficiency: perhaps a more interesting gain than a quadratic speedup with the number of elements in the training set. We will see this the second model of quantum perceptron in another post, since the technique used is basically the same.Estimating average and variance of a function2018-08-12T00:00:00+02:002018-08-12T00:00:00+02:00https://luongo.pro/2018/08/12/Estimate_Average_Function<p>I decided to write this post after reading a paper called <a href="https://arxiv.org/abs/1806.06893">Quantum Risk Analysis</a> <a href="#woerner2018quantum">(Woerner & Egger, 2018)</a> , by Stefan Woerner and Daniel J. Egger. Here I want to describe just the main technique employed by their algorithm (namely, how to use amplitude estimation to get useful information out of a function). In another post I will add describe more in detail the rest of the paper, which goes into technical details on how to use these techniques for solving a problem related to financial analysts.</p>
<p>Suppose we have a random variable $X$ described by a certain probability distribution over $N$ different outcomes, and a function $f: \{0,\cdots N\} \to \{0,1\}$ defined over this distribution. How can we use quantum computers to evaluate some properties of $f$ such as expected value and variance faster than classical computers?</p>
<p>Let’s start by translating into the quantum realm these two mathematical bojects. The probability distribution is (surprise surprise) represented in our quantum computer by a quantum state over $n=\lceil \log N \rceil$ qubits.
<script type="math/tex">\ket{\psi} = \sum_{i=0}^{N-1} \sqrt{p_i} \ket{i}</script>
where the probability of measuring the state $\ket{i}$ is $p_i,$ for $p_i \in [0, 1]$. Basically, each bases of the Hilbert space represent an outcome of the random variable.</p>
<p>The quantization of the function $f$ is made by a linear operator $F$ acting on a new ancilla qubit as such:
<script type="math/tex">F: \ket{i}\ket{0} \to \ket{i}\left(\sqrt{1-f(i)}\ket{0} + \sqrt{f(i)}\ket{1}\right)</script></p>
<p>If we apply $F$ with $\ket{\psi}$ as input state we get:</p>
<script type="math/tex; mode=display">\sum_{i=0}^{N-1} \sqrt{1-f(i)}\sqrt{p_i}\ket{i}\ket{0} + \sum_{i=0}^{N-1} \sqrt{f(i)}\sqrt{p_i}\ket{i}\ket{1}</script>
<p>Observe that the probability of measuring $\ket{1}$ in the ancilla qubit is $\sum_{i=0}^{N-1}p_if(i)$, which is (w00t w00t) $E[f(X)]$.
By sampling the ancilla qubit we won’t get any speedup, but if we can now apply <a href="https://arxiv.org/abs/quant-ph/0005055">amplitude estimation</a> <a href="#brassard2002quantum">(Brassard, Hoyer, Mosca, & Tapp, 2002)</a> to the ancilla qubit on the right, we can get an estimate of $E[F(X)]$.</p>
<p>Finally, observe that:</p>
<ul>
<li>if we chose $f(i)=\frac{i}{N-1}$ we are able to estimate $E[\frac{X}{N-1}]$ (which, by knowing $N$ gives us an estimate of the expected value of $X$)</li>
<li>if we chose $f(i)=\frac{i^2}{(N-1)^2}$ instead, we can estimate $E[X^2]$ and using this along with the previous choice of $f$ we can estimate the variance of $X$: $E[X^2] - E[X]^2$.</li>
</ul>
<p>See ya!</p>
<ol class="bibliography"><li><span id="woerner2018quantum">Woerner, S., & Egger, D. J. (2018). Quantum Risk Analysis. <i>ArXiv Preprint ArXiv:1806.06893</i>.</span></li>
<li><span id="brassard2002quantum">Brassard, G., Hoyer, P., Mosca, M., & Tapp, A. (2002). Quantum amplitude amplification and estimation. <i>Contemporary Mathematics</i>, <i>305</i>, 53–74.</span></li></ol>scinawaI decided to write this post after reading a paper called Quantum Risk Analysis (Woerner & Egger, 2018) , by Stefan Woerner and Daniel J. Egger. Here I want to describe just the main technique employed by their algorithm (namely, how to use amplitude estimation to get useful information out of a function). In another post I will add describe more in detail the rest of the paper, which goes into technical details on how to use these techniques for solving a problem related to financial analysts.Selected articles on Quantum Machine Learning2018-07-19T00:00:00+02:002018-07-19T00:00:00+02:00https://luongo.pro/2018/07/19/scinawa-review-qml<p>This is a collection of paper I have found useful in the last years. It is far from complete and you are welcome to suggest new entries here that you think I have missed.
I don’t claim for completeness though.</p>
<h4 id="2018">2018</h4>
<ul>
<li>
<p><a href="https://arxiv.org/pdf/1807.03341.pdf">Troubling Trends in Machine Learning Scholarship</a> <code class="highlighter-rouge">#opinion-paper</code><br />
Is a self-autocritic of the ML community on the way they are doing science now. I think this might be relevant as well for the QML practicioner.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1804.10068.pdf">Quantum machine learning for data scientits</a> <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#tutorial</code>
This is a very nice review of some of the most known qml algorithms. I wish I had this when I started studying QML.</p>
</li>
<li>
<p><a href="">Image classification of MNIST dataset using quantum slow feature analysis</a> <code class="highlighter-rouge">#algo</code><br />
This is my first work in quantum machine learning. Here we show 2 new algorithms
The idea is to give evidence that QRAM based algorithms can obtain a speedup w.r.t classical algorithm in QML <em>on real data</em>.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1804.03719.pdf">Quantum algorithm implementations for beginners</a> <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#tutorial</code></p>
</li>
</ul>
<h4 id="2017">2017</h4>
<ul>
<li>
<p><a href="">Implementing a distance based classifier with a quantum interference circuit</a> <code class="highlighter-rouge">#algo</code></p>
</li>
<li>
<p><a href="">Quantum machine learning for quantum anomaly detection</a> <code class="highlighter-rouge">#algo</code><br />
Here the authors used previous technique to perform anomaly detection. Basically they project the data on the 1-dimensional subspace of the covariance matrix of the data. In this way anomalies are supposed to lie furhter away from the rest of the dataset.</p>
</li>
<li>
<p><a href="https://arxiv.org/pdf/1707.08561.pdf"> Quantum machine learning: a classical perspective</a>: <code class="highlighter-rouge">#review</code> <code class="highlighter-rouge">#quantum learning theory</code></p>
</li>
</ul>
<h4 id="2016">2016</h4>
<ul>
<li>
<p><a href="">Quantum Discriminant Analysis for Dimensionality Reduction and Classification</a> <code class="highlighter-rouge">#algo</code><br />
Here the authors wrote two different algorithm, one for dimensionality reduction and the second for classification, with the same capabilities</p>
</li>
<li>
<p><a href="">Quantum Recommendation Systems</a> <code class="highlighter-rouge">#algo</code><br />
It is where you can learn about QRAM and quantum singular value estimation.</p>
</li>
</ul>
<h4 id="2015">2015</h4>
<ul>
<li>
<p><a href="https://arxiv.org/pdf/1512.02900.pdf">Advances in quantum machine learning</a> <code class="highlighter-rouge">#implementations</code>, <code class="highlighter-rouge">#review</code> <br />
It cover things up to 2015, so here you can find descriptions of Neural Networks, Bayesian Networks, HHL, PCA, Quantum Nearest Centroid, Quantum k-Nearest Neighbour, and others.</p>
</li>
<li>
<p><a href="">Quantum algorithms for topological and geometric analysis of data</a> <code class="highlighter-rouge">#algo</code></p>
</li>
</ul>
<h5 id="2014">2014</h5>
<ul>
<li>
<p><a href="">Quantum Algorithms for Nearest-Neighbor Methods for Supervised and Unsupervised Learning</a> <code class="highlighter-rouge">#tools</code>, <code class="highlighter-rouge">#algorithms</code><br />
This paper offer two approaches for calculating distances between vectors.
The idea for k-NN is to calculate distances between the test point and the training set in superposition and then use amplitude amplification tecniques to find the minimum, thus getting a quadratic speedup.</p>
</li>
<li>
<p><a href="">Quantum support vector machine for big data classification Patrick</a> <code class="highlighter-rouge">#algo</code><br />
This was one of the first example on how to use HHL-like algorithms in order to get something useful out of them.</p>
</li>
<li>
<p><a href="">Quantum self-testing</a> <code class="highlighter-rouge">#algo</code><br />
The authors discovered how partial application of the swap test are sufficient to transform a quantum state $\sigma$ into $U\sigma U^\dagger$ where $U=e^{-i\rho}$ given the ability to create multiples copies of $\rho$.
This work uses a particular access model of the data (sample complexity), which can be obtained from a QRAM</p>
</li>
</ul>
<h5 id="2013">2013</h5>
<ul>
<li><a href="https://arxiv.org/pdf/1307.0411.pdf">Quantum algorithms for supervised and unsupervised machine learning</a> <code class="highlighter-rouge">#algo</code><br />
This explain how to use swap test in order to calculate distances. Then it shows how this swap-test-for-distances can be used to do NearestCentroid and k-Means with adiabatic quantum computation</li>
</ul>
<h5 id="2009">2009</h5>
<ul>
<li><a href="">Quantum algorithms for linear systems of equations</a> <code class="highlighter-rouge">#algo</code><br />
This is the paper that started everything. :) Tecniques for sparse Hamiltonian simulation and phase estimation were applied in order to estimate the singular values of a matrix. Then a controleld rotation on ancilla qubit + postselection creates a state proportional to the solution of a system of equation. You can learn more about it <a href="HHL">here</a>.</li>
</ul>
<h3 id="code">Code</h3>
<ul>
<li><a href="http://grove-docs.readthedocs.io/en/latest/">Grove</a></li>
<li><a href="">Qiskit-acqua</a></li>
<li><a href="https://projectivesimulation.org">Projective Simulation</a></li>
</ul>scinawaThis is a collection of paper I have found useful in the last years. It is far from complete and you are welcome to suggest new entries here that you think I have missed. I don’t claim for completeness though.Quantum Frobenius Distance Classifier2018-07-18T00:00:00+02:002018-07-18T00:00:00+02:00https://luongo.pro/2018/07/18/Quantum-Frobenius-Distance-classifier<p>Yesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on <a href="https://arxiv.org/abs/1703.10793">distance based classification</a>, which is <a href="https://arxiv.org/abs/1803.00853">further analyzed here</a>. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta <a href="https://arxiv.org/abs/1805.08837">my paper</a>.</p>
<p>As usual, I assume data is stored in a QRAM. We are in the settings of supervised learning, so we have some labeled samples $x(i)$ in $\mathbb{R}^d$ for K different labels. Let $X_k$ be defined as the matrix whose rows are those vectors, and therefore have $K$ of those matrices.
$|T_k|$ is the number of elements in the cluster (so the number of rows in each matrix).</p>
<p>For a test point $x(0)$, define the matrix $ X(0) \in \mathbb{R}^{|T_k| x d} $
which just repeats the row $x(0)$ for $|T_k|$ times.
For $X(0)$, the number of rows is context dependent, but it hopefully be clear. Then, we define</p>
<script type="math/tex; mode=display">F_k( x(0)) = \frac{ ||X_k - X(0)||_F^2}{2 ( ||X_k||_F^2+ ||X(0)||_F^2) },</script>
<p>which corresponds to the average normalized squared distance between $x(0)$ and the cluster $k$.
Let $h : \mathcal{X} \to [K]$ our classification function. We assign to $x(0)$ a label according to the following rule:</p>
<script type="math/tex; mode=display">h(x(0)) := argmin_{k \in [K]} F_k( x(0))</script>
<p>We will estimate $F_k( x(0))$ efficiently using the algorithm below. From our QRAM construction we know we can create a superposition of all vectors in the cluster as quantum states, have access to their norms and to the total number of points and norm of the clusters. We define a normalization factor as:</p>
<script type="math/tex; mode=display">N_k= ||X_k||_F^2 + ||X(0)||_F^2 = ||X_k||_F^2 +|T_k| ||x(0)||^2.</script>
<h5 id="require">Require</h5>
<ul>
<li>QRAM access to the matrix $X_k$ of cluster $k$ and to a test vector $x(0)$. Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure">Ensure</h5>
<ul>
<li>An estimate $\overline{F_k (x(0))}$
such that $| F_k(x(0)) - \overline{F_k( x(0))} | < \eta $.</li>
</ul>
<h5 id="algorithm">Algorithm</h5>
<ul>
<li>Start with three empty quantum register. The first is an ancilla qubit, the second is for the index, and the third one is for the data.
<script type="math/tex">\ket{0}\ket{0}\ket{0}</script></li>
<li>$s:=0$</li>
<li>For $r=O(1/\eta^2)$
<ul>
<li>Create the state<br />
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \sqrt{|T_k|}||x(0)||\ket{0} +||X_k||_F \ket{1}\Big) \ket{0}\ket{0}</script></li>
<li>Apply to the first two register the unitary that maps:
<script type="math/tex">\ket{0}\ket{0} \mapsto \ket{0} \frac{1}{\sqrt{|T_k|}} \sum_{i \in T_k} \ket{i}\; \mbox{ and } \; \ket{1}\ket{0} \mapsto \ket{1} \frac{1}{||X_k||_F} \sum_{i \in T_k} ||x(i)|| \ket{i}</script>
This will get you to:
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} + \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i} \Big) \ket{0}</script></li>
<li>Now apply the unitary that maps
<script type="math/tex">\ket{0} \ket{i} \ket{0} \mapsto \ket{0} \ket{i} \ket{x(0)} \; \mbox{ and } \; \ket{1} \ket{i} \ket{0} \mapsto \ket{1} \ket{i} \ket{x(i)}</script></li>
</ul>
<p>to get the state
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} \ket{x(0)}+ \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></p>
<ul>
<li>Apply a Hadamard to the first register to get
<script type="math/tex">\frac{1}{\sqrt{2N_k}}\ket{0} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} + ||x(i)|| \ket{i}\ket{x(i)} \Big) +
\frac{1}{\sqrt{2N_k}}\ket{1} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} - ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></li>
<li>Measure the first register. If the outcome is $\ket{1}$ then $s:=s+1$</li>
</ul>
</li>
<li>Output $\frac{s}{r}$.</li>
</ul>
<p>Eventually, if you want to get a quadratic speedup w.r.t. $\eta$, perform amplitude estimation (with $O(1/\eta)$ iterations) on register $\ket{1}$ with the unitary implementing steps 1 to 4 to get an estimate $D$ within error $\eta$. This would make the circuit more complex, therefore less suitable for NISQ devices, but if you have enough qubits/fault tolerance, you can add it.</p>
<p>For the analysis, just note that the probability of measuring $\ket{1}$ is:</p>
<script type="math/tex; mode=display">\frac{1}{2N_k} \left ( |T_k|||x(0)||^2 + \sum_{i \in T_k} ||x(i)||^2 - 2\sum_{i \in T_k} \braket{x(0), x(i)} \right) = F_k(x(0)).</script>
<p>By Hoeffding bounds, to estimate $F_k(x(0))$ with error $\eta$ we would need $O(\frac{1}{\eta^2})$ samples.
For the running time, we assume all unitaries are efficient (i.e. we are capable of doing them in polylogarithmic time) either because the quantum states can be prepared directly by some quantum procedure or given that the classical vectors are stored in the QRAM, hence the algorithm runs in time $\tilde{O}(\frac{1}{\eta^2})$. We can of course use amplitude estimation and save a factor of $\eta$. Depending on the application one may prefer to keep the quantum part of the classifier as simple as possible or optimize the running time by performing amplitude estimation.</p>
<p>Given this estimator we can now define the QFD classifier.</p>
<h5 id="require-1">Require</h5>
<ul>
<li>QRAM access to $K$ matrices $X_k$ of elements of different classes.</li>
<li>A test vector $x(0)$.</li>
<li>Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure-1">Ensure</h5>
<ul>
<li>A label for $x(0)$.</li>
</ul>
<h5 id="algorithm-1">Algorithm</h5>
<ul>
<li>For $k \in [K]$
<ul>
<li>Use the QFD estimator to find $F_k(x(0))$ on $X_k$ and $x(0)$ with precision $\eta$.</li>
</ul>
</li>
<li>Output $h(x(0))=argmin_{k \in [K]} F_k( x(0))$.</li>
</ul>
<p>The running time of the classifier can be made $\tilde{O}(\frac{K}{\eta})$ when using amplitude amplification. That was it. QFDC basically exploit the subroutine for finding the average sqared distance between a point and a cluster and assign the test point to the “closest” cluster.</p>
<p>Drowbacks of this approach is that is very sentible to outliers. This is because we take the square of the distance of the points belonging to a cluster. This apparently can be mitigated by a proper dimensionality reduction algorithm, like <a href="QSFA">QSFA</a>.</p>scinawaYesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on distance based classification, which is further analyzed here. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta my paper.Iordanis Kerenidis’ talk on quantum machine learning2018-07-02T00:00:00+02:002018-07-02T00:00:00+02:00https://luongo.pro/2018/07/02/Iordanis-talk<p>This is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our <a href="https://arxiv.org/abs/1805.08837">recent results</a> on quantum slow feature analysis and classification of the MNSIT dataset.</p>
<p><a href="http://www.youtube.com/watch?v=KTVtMKo3g80" title="Quantum Algorithms for Classification"><img src="http://img.youtube.com/vi/KTVtMKo3g80/0.jpg" alt="Quantum Algorithms for Classification" /></a></p>scinawaThis is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our recent results on quantum slow feature analysis and classification of the MNSIT dataset.Quantum Slow Feature Analysis, a quantum algorithm for dimensionality reduction2018-06-16T00:00:00+02:002018-06-16T00:00:00+02:00https://luongo.pro/2018/06/16/quantum_slow_feature_analysis_a_quantum_algorithm_for_dimensionality_reduction<p>The original Slow Feature Analysis (SFA) was originally proposed to
learn slowly varying features from generic input signals that vary
rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999).
Computational neurologists observed long time ago that primary sensory
receptors, like the retinal receptors in an animal’s eye - are sensitive
to very small changes in the environment and thus vary on a very fast
time scale, the internal representation of the environment in the brain
varies on a much slower time scale. This observation is called <em>temporal
slowness principle</em>. SFA, being the state-of-the-art model for how this
temporal slowness principle is implemented, is an hypothesis for the
functional organization of the visual cortex (and possibly other sensory
areas of the brain). Said in a very practical way, we have some
“process” in our brain that behaves very similarly as dictated by SFA
(L. Wiskott et al. 2011).</p>
<p>Very beautifully, it is possible to show two reductions from two other
dimensionality reduction algorithms used in machine learning: Laplacian
Eigenmaps (a dimensionality reduction algorithm mostly suited for video
compressing) and Fisher Discriminant Analysis (a standard dimensionality
reduction algorithm). SFA can be applied in ML fruitfully, as there have
been many applications of the algorithm to solve ML related tasks. The
key concept for SFA (and LDA) is that he tries to project the data in
the subspace such that the distance between points with the same label
is minimized, while the distance between points with different label is
maximized.</p>
<h1 id="classical-sfa-for-classification">Classical SFA for classification</h1>
<p>The high level idea of using SFA for classification is the following:
One can think of the training set as an input series
$x(i) \in \mathbb{R}^d , i \in [n]$. Each $x(i)$ belongs to one of $K$
different classes. The goal is to learn $K-1$ functions
$g_j( x(i)), j \in [K-1]$ such that the output
$ y(i) = [g_1( x(i)), \cdots , g_{K-1}( x(i)) ]$ is very similar for
the training samples of the same class and largely different for samples
of different classes. Once these functions are learned, they are used to
map the training set in a low dimensional vector space. When a new data
point arrive, it is mapped to the same vector space, where
classification can be done with higher accuracy.</p>
<p>Now we introduce the minimization problem in its most general form as it
is commonly stated for classification (P. Berkes 2005). Let
$a=\sum_{k=1}^{K} \binom{|T_k|}{2}.$ For all $j \in [K-1]$, minimize:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Delta(y_j) = \frac{1}{a} \sum_{k=1}^K \sum_{s,t \in T_k \atop s<t} \left( g_j( x(s)) - g_j( x(t)) \right)^2 %]]></script>
<p>with the following constraints:</p>
<ol>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i\in T_k} g_j( x(i)) = 0 $</p>
</li>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))^2 = 1 $</p>
</li>
<li>
<p>$ \frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))g_v( x(i)) = 0 \quad \forall v < j $</p>
</li>
</ol>
<p>For some beautiful theoretical reasons, QSFA algorithm is in practice an
algorithm for fidning the solution of the <em>generalized eigenvalue
problem</em>:</p>
<script type="math/tex; mode=display">AW= \Lambda BW</script>
<p>Here $W$ is the matrix of the singular vectors, $\Lambda$ the diagonal matrix of singular values. For SFA $A$ and $B$ are defined as: $ A=\dot{X}^T \dot{X} $ and $B := X^TX$, where $\dot{X}$ is the matrix of the derivative of the data: i.e. for each possible elements with the same label we calculate the pointwise difference between vectors. (computationally, it suffice to sample $O(n)$ tuples fom the uniform distribution of all possible derivatives.</p>
<p>It is possible to see that the slow feature space we are looking for is is spanned by the eigenvectors of $W$ associated to the $K-1$ smallest eigenvalues of
$\Lambda$.</p>
<h1 id="quantum-sfa">Quantum SFA</h1>
<p>In (Kerenidis and Luongo 2018) we show how, using a “QuantumBLAS” ( i.e.
a set of quantum algorithm that we can use to perform linear algebraic
operations), we can perform the following algorithms. The intuition
behind this algorithm is that the derivative matrix of the data can be
pre-computed on non-whitened data, like one might do classically (and
spare a matrix multiplication). Since with quantum computer we don’t
have this problem, since we know how to perform matrix multiplication
efficiently. As in the classical algorithm, we have to do some
preprocessing to our data. For the quantum case, preprocessing consist
in:</p>
<ol>
<li>
<p>Polynomially expand the data with a polynomial of degree 2 or 3</p>
</li>
<li>
<p>Normalize and Scale the rows of the dataset $X$.</p>
</li>
<li>
<p>Create $\dot{X}$ by sampling from the distribution of possible
couples of rows of $X$ with the same label.</p>
</li>
<li>
<p>Create QRAM for $X$ and $\dot{X}$</p>
</li>
</ol>
<p>Note that all these operation are at most $O(nd\log(nd))$ in the size of
the training set, which is a time that we need to spend anyhow, even by
collecting the data classically.</p>
<p>To use our algorithm for classification, you use QSFA to bring one
cluster at the time, along with the new test point in the slow feature
space, and perform any distance based classification algorithm, like
QFDC or swap tests, and so on. The quantum algorithm is the following:</p>
<ul>
<li>
<p><strong>Require</strong> Matrices $X \in \mathbb{R}^{n \times d}$ and
$\dot{X} \in \mathbb{R}^{n \times d}$ in QRAM, parameters
$\epsilon, \theta,\delta,\eta >0$.\</p>
</li>
<li>
<p><strong>Ensure</strong> A state $\ket{\bar{Y}}$ such that
$ | \ket{Y} - \ket{\bar{Y}} | \leq \epsilon$, with
<script type="math/tex">Y = A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z</script></p>
</li>
</ul>
<ol>
<li>
<p>Create the state
<script type="math/tex">\ket{X} := \frac{1}{ {||X ||}_F} \sum_{i=1}^{n} {||x(i) ||} \ket{i}\ket{x(i)}</script>
using the QRAM that stores the dataset.</p>
</li>
<li>
<p>(Whitening algorithm) Map $\ket{X}$ to $\ket{\bar{Z}}$ with
$| \ket{\bar{Z}} - \ket{Z} | \leq \epsilon $ and $Z=XB^{-1/2}.$
using quantum access to the QRAM.</p>
</li>
<li>
<p>(Projection in slow feature space) Project $\ket{\bar{Z}}$ onto the
slow eigenspace of $A$ using threshold $\theta$ and precision
$\delta$ (i.e.
<script type="math/tex">A^+_{\leq \theta, \delta}A_{\leq \theta, \delta}\bar{Z}</script> )</p>
</li>
<li>
<p>Perform amplitude amplification and estimation on the register
$\ket{0}$ with the unitary $U$ implementing steps 1 to 3, to obtain
$\ket{\bar{Y}}$ with $| \ket{\bar{Y}} - \ket{Y} | \leq \epsilon $
and an estimator $ \bar{ {|| Y ||} } $ with multiplicative error
$\eta$.</p>
</li>
</ol>
<p>Overall, the algorithm is subsumed in the following Theorem.</p>
<p>Let $X = \sum_i \sigma_i u_iv_i^T \in \mathbb{R}^{n\times d}$ and its
derivative matrix $\dot{X} \in \mathbb{R}^{n \log n \times d}$ stored in
QRAM. Let $\epsilon, \theta, \delta, \eta >0$. There exists a quantum
algorithm that produces as output a state <script type="math/tex">\ket{\bar{Y}}</script> with
<script type="math/tex">| \ket{\bar{Y}} - \ket{A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z} | \leq \epsilon</script>
in time
<script type="math/tex">\tilde{O}\left( \left( \kappa(X)\mu(X)\log (1/\varepsilon) + \frac{ ( \mu({X})+ \mu(\dot{X}) ) }{\delta\theta} \right)
\frac{||{Z}||}{ ||A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} {Z} ||} \right)</script>
and an estimator $\bar{||Y ||}$ with
$ | \bar{||Y ||} - ||Y || | \leq \eta {||Y ||}$ with an additional
<script type="math/tex">1/\eta</script> factor.</p>
<p>A prominent advantage of SFA compared to other algorithms is that <em>it
is almost hyperparameter-free</em>. The only parameters to chose are in the
preprocessing of the data, e.g. the initial PCA dimension and the
nonlinear expansion that consists of a choice of a polynomial of
(usually low) degree $p$. Another advantage is that it is <em>guaranteed to
find the optimal solution</em> within the considered function space
(Escalante-B and Wiskott 2012). We made an experiment, and using QSFA with a quantum classifier, we were
able to reach 98.5% accuracy in doing digit recognition: we were able to
read 98.5% among 10.000 images of digits given a training set of 60.000
digits.</p>
<h3 id="references">References</h3>
<div id="refs" class="references">
<div id="ref-Berkes2005pattern">
Berkes, Pietro. 2005. “Pattern Recognition with Slow Feature Analysis.”
*Cognitive Sciences EPrint Archive (CogPrints)* 4104.
[http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/\~berkes](http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/~berkes).
</div>
<div id="ref-escalante2012slow">
Escalante-B, Alberto N, and Laurenz Wiskott. 2012. “Slow Feature
Analysis: Perspectives for Technical Applications of a Versatile
Learning Algorithm.” *KI-Künstliche Intelligenz* 26 (4). Springer:
341–48.
</div>
<div id="ref-jkereLuongo2018">
Kerenidis, Iordanis, and Alessandro Luongo. 2018. “Quantum
Classification of the Mnist Dataset via Slow Feature Analysis.” *arXiv
Preprint arXiv:1805.08837*.
</div>
<div id="ref-scholarpedia2017SFA">
Wiskott, L., P. Berkes, M. Franzius, H. Sprekeler, and N. Wilbert. 2011.
“Slow Feature Analysis.” *Scholarpedia* 6 (4): 5282.
doi:[10.4249/scholarpedia.5282](https://doi.org/10.4249/scholarpedia.5282).
</div>
<div id="ref-wiskott1999learning">
Wiskott Laurenz, and Laurenz Wiskott. 1999. “Learning invariance
manifolds.” *Neurocomputing* 26-27. Elsevier: 925–32.
doi:[10.1016/S0925-2312(99)00011-9](https://doi.org/10.1016/S0925-2312(99)00011-9).
</div>
</div>["scinawa"]The original Slow Feature Analysis (SFA) was originally proposed to learn slowly varying features from generic input signals that vary rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999). Computational neurologists observed long time ago that primary sensory receptors, like the retinal receptors in an animal’s eye - are sensitive to very small changes in the environment and thus vary on a very fast time scale, the internal representation of the environment in the brain varies on a much slower time scale. This observation is called temporal slowness principle. SFA, being the state-of-the-art model for how this temporal slowness principle is implemented, is an hypothesis for the functional organization of the visual cortex (and possibly other sensory areas of the brain). Said in a very practical way, we have some “process” in our brain that behaves very similarly as dictated by SFA (L. Wiskott et al. 2011).How to evaluate a classifier2018-06-10T00:00:00+02:002018-06-10T00:00:00+02:00https://luongo.pro/2018/06/10/evaluate_classifier<p>Practitioners in quantum machine learning should not only build their
skills in quantum algorithms, and having some basic notions of
statistics and data science won’t hurt. In the following the see some
ways to evaluate a classifier. What does it means in practice? Imagine
you have a medical test that is able to tell if a patient is sick or
not. You might want to consider the behavior of your classier with
respect to the following parameters: the cost of identifying a sick
patient as healthy is high, and the cost of identifying a healthy
patient as sick. For example, if the patient is a zombie and it
contaminates all the rest of the humanity you want to minimize the
occurrences of the first case, while if the cure for “zombiness” is
lethal for a human patient, you want to minimize the occurrences of the
second case. With P and N we count the number of patients tested
Positively or Negatively. This is formalized in the following
definitions, which consists in statistics to be calculated on the test
set of a data analysis.</p>
<ul>
<li>
<p><strong>TP True positives (statistical power)</strong> : are those labeled as
sick that are actually sick.</p>
</li>
<li>
<p><strong>FP False positives (type I error)</strong>: are those labeled as sick but
that actually are healthy</p>
</li>
<li>
<p><strong>FN False negatives (type II error)</strong> : are those labeled as
healthy but that are actually sick.</p>
</li>
<li>
<p><strong>TN True negative</strong>: are those labeled as healthy that are healthy.</p>
</li>
</ul>
<p>Given this simple intuition, we can take a binary classifier and imagine
to do an experiment over a data set. Then we can measure:</p>
<ul>
<li>
<p><strong>True Positive Rate (TPR) = Recall = Sensitivity</strong>: is the ratio of
correctly identified elements among all the elements identified as
sick. It answer the question: “how are we good at detecting sick
people?”.
<script type="math/tex">\frac{ TP }{ TP + FN} + \frac{TP }{P} \simeq P(test=1|sick=1)</script>
This is an estimator of the probability of a positive test given a
sick individual.</p>
</li>
<li>
<p><strong>True Negative Rate (TNR) = Specificity</strong> is a measure that tells
you how many are labeled as healthy but that are actually sick.
<script type="math/tex">\frac{ TN }{ TN + FP} = p(test = 0 | sick =0)</script> How many
healthy patients will test negatively to the test? How are we good
at avoiding false alarms?</p>
</li>
<li>
<p><strong>False Positive Rate = Fallout</strong>
<script type="math/tex">FPR = \frac{ FP }{ FP + TN } = 1 - TNR</script></p>
</li>
<li>
<p><strong>False Negative Rate = Miss Rate</strong>
<script type="math/tex">FNR = \frac{ FN }{ FN + TP } = 1 - TPR</script></p>
</li>
<li>
<p><strong>Precision, Positive Predictive Value (PPV)</strong>:
<script type="math/tex">\frac{ TP }{ TP + FP} \simeq p(sick=1 | positive=1)</script> How many
positive to the test are actually sick?</p>
</li>
<li>
<p><strong>$F_1$ score</strong> is a more compressed index of performance which is a
possible measure of performance of a binary classifier. Is simply
the harmonic mean of Precision and Sensitivity:
<script type="math/tex">F_1 = 2\frac{Precision \times Sensitivity }{Precision + Sensitivity }</script></p>
</li>
<li>
<p><strong>Receiver Operating Characteristic (ROC)</strong> Evaluate the TRP and FPR
at all the scores returned by a classifier by changing a parameter.
It is a plot of the true positive rate against the false positive
rate for the different possible value (cutpoints) of a test or
experiment.</p>
</li>
<li>
<p>The <strong>confusion matrix</strong> generalize these 4 combination of (TP TN FP
FN) to multiple classes: is a $l \times l$ where at row $i$ and
column $j$ you have the number of elements from the class$i$ that
have been classified as elements of class $j$.</p>
</li>
</ul>
<p>Bref. This post because I always forgot about these terms and I wasn’t
able to find them described in a concise way with the same formalism
without googling more time than that I spent writing this post. Other
links:
<a href="https://uberpython.wordpress.com/2012/01/01/precision-recall-sensitivity-and-specificity/">here</a></p>["Alessandro Luongo"]Practitioners in quantum machine learning should not only build their skills in quantum algorithms, and having some basic notions of statistics and data science won’t hurt. In the following the see some ways to evaluate a classifier. What does it means in practice? Imagine you have a medical test that is able to tell if a patient is sick or not. You might want to consider the behavior of your classier with respect to the following parameters: the cost of identifying a sick patient as healthy is high, and the cost of identifying a healthy patient as sick. For example, if the patient is a zombie and it contaminates all the rest of the humanity you want to minimize the occurrences of the first case, while if the cure for “zombiness” is lethal for a human patient, you want to minimize the occurrences of the second case. With P and N we count the number of patients tested Positively or Negatively. This is formalized in the following definitions, which consists in statistics to be calculated on the test set of a data analysis.qramutils: gather statistics for your QRAM2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Gather-statistics-for-your-QRAM<p>Generally, with the term QRAM people are referring to an oracle, or
generically to a unitary, that gets called with the purpose of creating
a state in a quantum circuit. This state represents some (classical)
data that you want to process later in your algorithm. More formally,
QRAM allows you to perform operations like:
$\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some
$i \in [n]$. This model can be used to create states proportional to
classical vectors, and allowing us to perform queries:
$\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for
some $i \in [n]$</p>
<p>Querying the QRAM is assumed to be done efficiently. The running time is
expected to be polylogarithmic in the matrix dimensions, but eventually
the time complexity might polynomial in other parameters. As an example,
in QRAM described in Kerenidis and Prakash (2017)Kerenidis and Prakash
(2016)Prakash (2014) the authors stores a matrix decomposition such that
the running time of a query might depend on the Frobenius norm, or a
parametrized function, which is specific to their implementation. In
this model, the best parametrization of the decomposition might depend
on the dataset. This means that in practice, you might need to estimate
these parameters, and therefore I’ve decided to write a library for
this. Specifically, given a matrix $A$ to store in QRAM, you have to
find the value $p \in \left(0, 1 \right)$ such that it minimize the
function: <script type="math/tex">\mu_p(A) = \sqrt{ s_{2p}(A) s_{2(1-p)}(A^T)}</script> where
$s_p(A) := max_{i \in [m]} |A|_F^p $ is the maximum $l_p$ norm to the
power of $p$ of the row vectors.</p>
<p>Being able to estimate parameters of a dataset might happen also with
other model of access to the data. For instance, other algorithms such
HHL uses Hamiltonian simulation, which has an access model that makes
the complexity of the algorithm depend on the sparsity.</p>
<p>So far qramutils analyze a given numpy matrix for the following
parameters:</p>
<ul>
<li>
<p>The sparsity.</p>
</li>
<li>
<p>The conditioning number.</p>
</li>
<li>
<p>The Frobenius norm (of the rescaled matrix such that
$0< \sigma_i < 1$).</p>
</li>
<li>
<p>The best parameter $p$ for the matrix decomposition described above.</p>
</li>
<li>
<p>Some boring and common plotting.</p>
</li>
</ul>
<p><a href="https://github.com/Scinawa/qramutils">Here</a> you can find the
repository.</p>
<p>This code might be improved in many directions! For instance, I’d like
to integrate in the library the code for plotting the parameters for
various PCA dimensions and/or degree of polynomial expansion, integrate
options for dataset normalization, scaling, and maybe expand the type of
accepted input data, and so on..</p>
<p>Ideally, for other kind of matrices there hopefully might be other kind
matrix decompositions available and therefore there might be the need to
estimate other parameters in the future. This is where I’ll add that
code for that. :)</p>
<p>This is an example of usage on the MNIST dataset:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ pipenv run python3 examples/mnist_QRAM.py --help
usage: mnist_QRAM.py [-h] [--db DB] [--generateplot] [--analize]
[--pca-dim PCADIM] [--polyexp POLYEXP]
[--loglevel {DEBUG,INFO}]
Analyze a dataset and model QRAM parameters
optional arguments:
-h, --help show this help message and exit
--db DB path of the mnist database
--generateplot run experiment with various dimension
--analize Run all the analysis of the matrix
--pca-dim PCADIM pca dimension
--polyexp POLYEXP degree of polynomial expansion
--loglevel {DEBUG,INFO}
set log level
</code></pre>
</div>
<p>This is the output, assuming you have a folder called data that holds
the MNIST dataset.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 examples/mnist_QRAM.py --db data --analize --loglevel INFO
04-01 22:23 INFO Calculating parameters for default configuration: PCA dim 39, polyexp 2
04-01 22:24 INFO Matrix dimension (60000, 819)
04-01 22:24 INFO Sparsity (0=dense 1=empty): 0.0
04-01 22:24 INFO The Frobenius norm: 4.6413604982930385
04-01 22:26 INFO best p 0.8501000000000001
04-01 22:26 INFO Best p value: 0.8501000000000001
04-01 22:26 INFO The \mu value is: 4.6413604982930385
04-01 22:26 INFO Qubits needed to index+data register: 26.
</code></pre>
</div>
<p>If you want to use the library in your source code:</p>
<div class="highlighter-rouge"><pre class="highlight"><code> libq = qramutils.QramUtils(X, logging_handler=logging)
logging.info("Matrix dimension {}".format(X.shape))
sparsity = libq.sparsity()
logging.info("Sparsity (0=dense 1=empty): {}".format(sparsity))
frob_norm = libq.frobenius()
logging.info("The Frobenius norm: {}".format(frob_norm))
best_p, min_sqrt_p = libq.find_p()
logging.info("Best p value: {}".format(best_p))
logging.info("The \\mu value is: {}".format(min(frob_norm, min_sqrt_p)))
qubits_used = libq.find_qubits()
logging.info("Qubits needed to index+data register: {} ".format(qubits_used))
</code></pre>
</div>
<p>To install, you just need to do the following:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 setup.py sdist
</code></pre>
</div>
<p>And then, your package will be ready to be installed as:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv install dist/qramutils-0.1.0.tar.gz
</code></pre>
</div>
<div id="refs" class="references">
<div id="ref-kerenidis2016quantum">
Kerenidis, Iordanis, and Anupam Prakash. 2016. “Quantum Recommendation
Systems.” *ArXiv Preprint ArXiv:1603.08675*.
</div>
<div id="ref-kerenidis2017quantum">
———. 2017. “Quantum Gradient Descent for Linear Systems and Least
Squares.” *ArXiv Preprint ArXiv:1704.04992*.
</div>
<div id="ref-prakash2014quantum">
Prakash, Anupam. 2014. *Quantum Algorithms for Linear Algebra and
Machine Learning*. University of California, Berkeley.
</div>
</div>scinawaGenerally, with the term QRAM people are referring to an oracle, or generically to a unitary, that gets called with the purpose of creating a state in a quantum circuit. This state represents some (classical) data that you want to process later in your algorithm. More formally, QRAM allows you to perform operations like: $\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some $i \in [n]$. This model can be used to create states proportional to classical vectors, and allowing us to perform queries: $\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for some $i \in [n]$Failed Attempt To Reverse Swap Test2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Failed-attempt-to-reverse-swap-test<p>This post has born from an attempt of finding a reversible circuit for
computing the swap test: a circuit used to compute the inner product of
two quantum states. This circuit was originally proposed for solving the
state distinguishably problem, but as you can imagine is very used in
quantum machine learning too. Before starting, let’s note one thing. A
reversible circuit for the swap test implies that we are able to
recreate the two input states. Conceptually, this should be impossible,
because of the no cloning theorem. With a very neat observation we can
realize that we are not even able to preserve one of the states.</p>
<p>There is no unitary operator $U\ket{x}\ket{y}$ that allows you to
estimate the scalar product between two states $x,y$ as $\braket{x|y}$
using only one copy of $\ket{x}$.</p>
<p>By absurd. Assume this unitary exists. Than it would be possible to
estimate the scalar product between $\ket{x}$ and all the base states
$\ket{i}$. (basically doing tomography for the state). This is a way of
recover classically the state of $\ket{x}$. By knowing $\ket{x}$, we
could recreate as many copies as we want of $\ket{x}$. Therefore, we
could use this procedure to clone a state. This is prevented by the
no-cloning theorem.</p>
<p>Let’s see what happens if we try to reverse it.</p>
<p><img src="/assets/reverse_swap.png" alt="image" /></p>
<p>It is good to know that the circuit in Figure [conservative] is
inspired by the proof $BPP \subseteq BQP$. The idea is the following: if
after a swap test, and before doing any measurement on the ancilla
qubit, we do a CNOT on a second ancillary qubit, and then execute the
inverse of the swap test. Being the swap test self-inverse operator, it
simply means that we apply the swap test twice. Let’s start the
calculations from the CNOT on the second ancilla qubit.</p>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{00} + \left( \ket{ab} - \ket{ba} \right)\ket{11} \Big] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{+0} + \left( \ket{ab} - \ket{ba} \right)\ket{-1} \Big] \xrightarrow{\text{SWAP}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} + \ket{ba} \Big) \ket{0} + \Big( \ket{ab} + \ket{ba} \Big) \ket{1} \Big] \ket{0} + \frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} - \ket{ba} \Big) \ket{0} - \Big( \ket{ba} - \ket{ab} \Big) \ket{1} \Big] \ket{1}
\right] =</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{+} \right] \ket{0} +
\left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{+} \right] \ket{1}
\right] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{0} \right] \ket{0} +
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{0} \right] \ket{1}
\right].</script>
<p><script type="math/tex">p(\ket{0}) = \frac{1}{4}\Big( 2 + 2 |\braket{ab|ba}|\Big) = \frac{ 1+ \braket{ab|ba}}{2} = \frac{ 1+ |\braket{a|b}|^2}{2}</script>
And therefore $p(\ket{1})$ is $\frac{ 1- |\braket{a|b}|^2}{2}$ as in the
original swap test. So, the result is the same, but as in the original
swap test, the register are pretty entangled, therefore we haven’t
reversed our swap.</p>
<p>Here I have applied the rules:</p>
<ul>
<li>
<p>$ (A\otimes B)^{\dagger} = A^{\dagger} \otimes B^{\dagger} $</p>
</li>
<li>
<p>$ \left( \bra{\phi} \otimes \bra{\psi} \right) \left( \ket{\phi} \otimes \ket{\psi} \right) = \braket{\psi, \psi} \otimes \braket{\phi, \phi}$</p>
</li>
</ul>
<p>You may have noted that this circuit is very similar to circuit that you
obtain if you perform amplitude amplification Brassard et al. (2000) on
the swap test. The swap circuit is the algorithm $A$ that produces
states with a certain probability distribution, and the CNOT is the
unitary $U_f$ that is able to recognize the “good” states from bad
states. By setting the second ancilla qubit to $\ket{+}$ we would be
able to write on the phase of our state some useful information to
recover with a QFT later on. That’s very cool, since amplitude
amplification allows us to decrease quadratically the computational
complexity of the algorithm with respect to the error in the estimation
of the amplitude of the ancilla qubit.</p>
<div id="refs" class="references">
<div id="ref-brassard2002quantum">
Brassard, Gilles, Peter Høyer, Michele Mosca, and Alain Tapp. 2000.
“Quantum Amplitude Amplification and Estimation.” *ArXiv Preprint
Quant-Ph/0005055*.
</div>
</div>scinawaThis post has born from an attempt of finding a reversible circuit for computing the swap test: a circuit used to compute the inner product of two quantum states. This circuit was originally proposed for solving the state distinguishably problem, but as you can imagine is very used in quantum machine learning too. Before starting, let’s note one thing. A reversible circuit for the swap test implies that we are able to recreate the two input states. Conceptually, this should be impossible, because of the no cloning theorem. With a very neat observation we can realize that we are not even able to preserve one of the states.Hamiltonian Simulation2018-02-18T00:00:00+01:002018-02-18T00:00:00+01:00https://luongo.pro/2018/02/18/Hamiltonian-simulation<p>These are my notes are on Childs (n.d.).</p>
<h1 id="introduction">Introduction</h1>
<p>The only way possible to start a chapter on Hamiltonian simulation would
be to start from the work of Feynman, who had the first intuition on the
power of quantum mechanics for simulating physics with computers. We
know that the Hamiltonian dynamics of a closed quantum system, weather
its evolution changes with time or not, is give by the
Schr<span>ö</span>dinger equation:</p>
<script type="math/tex; mode=display">i\hbar \frac{d}{dt}\ket{\psi(t)} = H(t)\ket{\psi(t)}</script>
<p>Given the initial conditions of the system (i.e. $\ket{\psi(0)} $ ) is
it possible to know the state of the system at time
$t: \ket{\psi(t)} = e^{-i (H_1t/m)}\ket{\psi(0)}$.</p>
<p>As you can imagine, classical computers are suppose to struggle
simulating the system to get $ \ket{\psi(t)}$, since this equation
describes the dynamics of any quantum system, and we don’t think (hope
:D ) classical computer can simulate that efficiently. But we know that
quantum computers can help “copying” the dynamic of another quantum
system. Why would you be bothered?</p>
<p>Imagine you are a quantum machine learning scientist, and you have just
found a new mapping between an optimization problem and an Hamiltonian
dynamics, and you want to use quantum computer to perform the
optimization Otterbach et al. (2017). You expect a quantum computers to
run the Hamiltonian simulation for you, and then sample useful
information from the resulting quantum sate. This result might be fed
again into your classical algorithm to perform ML related task, in a
virtuous cycle of hybrid quantum-classical computation.</p>
<p>Or imagine you that you are a chemist, and you have developed an
hypothesis for the Hamiltonian dynamics of a chemical compound. Now you
want to run some experiments to see if the formula behaves according to
the experiments. Or maybe you are testing properties of complex
compounds you don’t want to synthesize. We can formulate the problem of
HS in this way:</p>
<p><span>Hamiltonian simulation problem</span>: Given a state
$\ket{\psi(0)}$ and an Hamiltonian $H$, obtain a state $\ket{\psi(t)}$
such that $\ket{\psi(t)}:=e^{-iHt}\ket{\psi(0)}$ and
$|\ket{\psi(0)} - \ket{\tilde{\psi(t)}}| < \varepsilon$ for some norm
(usually trace norm).</p>
<p>Which leads us to the definition of efficiently simulable Hamiltonian:</p>
<p><span>Efficient Hamiltonian simulation</span> Given a state
$\ket{\psi(0)}$ and an Hamiltonian $H$ acting on $n$ qubits, we say $H$
can efficiently simulated if,
$\forall t \geq 0, \forall \varepsilon \geq 0$, there is a quantum
circuit such $U$ that $||U - e^{-iHt} || < \varepsilon$ using a number
of gates that is polynomial in $n,t, 1/\varepsilon$.</p>
<p>In the following, we suppose to have a quantum computer and quantum
access to the Hamiltonian $H$. Te importance of this problem might not
be immediately clear to a computer scientist. But if we think that every
quantum circuit is described by an Hamiltonian dynamic, being able to
simulate an Hamiltonian is like being able to have virtual machines in
our computer. (This example actually came from a talk at IHP of Toby
Cubitt!) Remember that there’s a theorem that says that for an
Hamiltonian simulation problem, the number of gates is $\omega{t}$, and
this Theorem goes under the name of No fast-forward Theorem. <br>
But concretely? What does it means to simulate an Hamiltonian of a
physical system? Let’s take the Hamiltonian of a particle in a
potential: <script type="math/tex">H = \frac{p^2}{2m} + V(x)</script> We want to know the position of
the particle at time $t$ and therefore we have to compute
$e^{-iHt}\ket{\psi(0)}$</p>
<h2 id="some-hamiltonians-we-know-to-simulate-efficiently">Some Hamiltonians we know to simulate efficiently</h2>
<ul>
<li>
<p>Hamiltonians that represent the dynamic of a quantum circuits (more
formally, where you only admit local interactions between a constant
number of qubits). This result is due to the famous
Solovay-Kitaev Theorem. That says that there exist an efficient
compiler from an architecture that use a set of gates $\mathbb{S_1}$
and another quantum computer that uses a set of universal gates
$\mathbb{S_2}$.</p>
</li>
<li>
<p>If the Hamiltonian can be efficiently applied for a basis, then also
$UHU$ can be efficiently applied. Proof:
$e^{-iUHU^\dagger t} = Ue{-iH t}U^\dagger $.</p>
</li>
<li>
<p>If $H$ is diagonal in the computational basis and we can compute
efficiently $\braket{a||H|a}$ for a basis element $a$. By linearity:
<script type="math/tex">\ket{a,0} \to \ket{a, d(a)} \to e^{-itd(a)} \otimes I \ket{a,d(a)t} \to e^{-itd(a)}\ket{a,0} = e^{-itH}\ket{a,0}</script></p>
<p>(In general: if we know how to calculate the eigenvalues, we can
apply an Hamiltonian efficiently.)</p>
</li>
<li>
<p>The sum of two efficiently simulable Hamiltonians is efficiently
simulable using Lie product formula
<script type="math/tex">e^{-i (H_1 + H_2) t} = lim_{m \to \infty} ( e^{-i (H_1t/m)} + e^{-i (H_2t/m) t} )^m</script>
We chose $m$ such that
<script type="math/tex">|| e^{-i (H_1 + H_2) t} - ( e^{-i (H_1t/m)} + e^{-i (H_2t/m) t} )^m || \leq</script>
and this gives $m=(vt^2/\varepsilon)$ and
$v=\max{ ||H_1||, ||H_2||}$. Using higher order approximation is
possible to reduce the dependency on $t$ to $O(t^1+\delta)$ for a
chosen $\delta$. (wtf!)</p>
</li>
<li>
<p>This facts can be used to show that the sum of polynomially many
efficiently simulable Hamiltonians is simulable efficiently.</p>
</li>
<li>
<p>The commutator $[H_1, H_2]$ of two efficiently simulable Hamiltonian
can be computed efficiently because:
<script type="math/tex">e^{-i[H_1, H_2]t} = lim_{m\to \infty} (e^{-iH_1\sqrt[]{t/m}}e^{-iH_2\sqrt[]{t/m}}e^{H_1\sqrt[]{t/m}}e^{H_1\sqrt[]{t/m}})^m</script>
which we believe, without having idea on how to check it. :/</p>
</li>
<li>
<p>If the Hamiltonian is sparse, it can be efficiently simulated. The
idea is to pre-compute a edge-coloring of the graph represented by
the adjacency matrix of the sparse Hamiltonian. (For each $H$ you
can consider a graph $G=(V, E)$ such that its adjacency matrix $A$
is $a_{ij}=1$ if $H_{ij} \neq 0$ ).</p>
</li>
</ul>
<p>Recalling the example of a particle in a potential energy: its momentum
<script type="math/tex">\frac{p^2}{2m}</script> is diagonal in the fourier basis (and we know how to
do a QFT), and the potential $V(x)$ is diagonal in the computational
basis, thus this Hamiltonian is easy to simulate.</p>
<p>Exercise/open problem: do we know any algorithm that might benefit the
efficient simulation of $[H_1, H_2]$? Childs in Childs (n.d.) claims he
is not aware of any algorithm that uses that.</p>
<div id="refs" class="references">
<div id="ref-childs">
Childs, Andrew. n.d. “Lecture Notes in Quantum Algorithmics.”
</div>
<div id="ref-otterbach2017unsupervised">
Otterbach, JS, R Manenti, N Alidoust, A Bestwick, M Block, B Bloom, S
Caldwell, et al. 2017. “Unsupervised Machine Learning on a Hybrid
Quantum Computer.” *ArXiv Preprint ArXiv:1712.05771*.
</div>
</div>scinawaThese are my notes are on Childs (n.d.).