Jekyll2018-07-18T08:15:12+02:00https://luongo.pro/returnlambdaThe level of achievement that you have in anything, is a reflection of how well you were able to focus on it (Steve Vai)
Quantum Frobenius Distance Classifier2018-07-18T00:00:00+02:002018-07-18T00:00:00+02:00https://luongo.pro/2018/07/18/Quantum-Frobenius-Distance-classifier<p>Yesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on <a href="https://arxiv.org/abs/1703.10793">distance based classification</a>, which is <a href="https://arxiv.org/abs/1803.00853">further analyzed here</a>. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta <a href="https://arxiv.org/abs/1805.08837">my paper</a>.</p>
<p>As usual, I assume data is stored in a QRAM. We are in the settings of supervised learning, so we have some labeled samples $x(i)$ in $\mathbb{R}^d$ for K different labels. Let $X_k$ be defined as the matrix whose rows are those vectors, and therefore have $K$ of those matrices.
$|T_k|$ is the number of elements in the cluster (so the number of rows in each matrix).</p>
<p>For a test point $x(0)$, define the matrix $ X(0) \in \mathbb{R}^{|T_k| x d} $
which just repeats the row $x(0)$ for $|T_k|$ times.
For $X(0)$, the number of rows is context dependent, but it hopefully be clear. Then, we define</p>
<script type="math/tex; mode=display">F_k( x(0)) = \frac{ ||X_k - X(0)||_F^2}{2 ( ||X_k||_F^2+ ||X(0)||_F^2) },</script>
<p>which corresponds to the average normalized squared distance between $x(0)$ and the cluster $k$.
Let $h : \mathcal{X} \to [K]$ our classification function. We assign to $x(0)$ a label according to the following rule:</p>
<script type="math/tex; mode=display">h(x(0)) := argmin_{k \in [K]} F_k( x(0))</script>
<p>We will estimate $F_k( x(0))$ efficiently using the algorithm below. From our QRAM construction we know we can create a superposition of all vectors in the cluster as quantum states, have access to their norms and to the total number of points and norm of the clusters. We define a normalization factor as:</p>
<script type="math/tex; mode=display">N_k= ||X_k||_F^2 + ||X(0)||_F^2 = ||X_k||_F^2 +|T_k| ||x(0)||^2.</script>
<h5 id="require">Require</h5>
<ul>
<li>QRAM access to the matrix $X_k$ of cluster $k$ and to a test vector $x(0)$. Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure">Ensure</h5>
<ul>
<li>An estimate $\overline{F_k (x(0))}$
such that $| F_k(x(0)) - \overline{F_k( x(0))} | < \eta $.</li>
</ul>
<h5 id="algorithm">Algorithm</h5>
<ul>
<li>Start with three empty quantum register. The first is an ancilla qubit, the second is for the index, and the third one is for the data.
<script type="math/tex">\ket{0}\ket{0}\ket{0}</script></li>
<li>$s:=0$</li>
<li>For $r=O(1/\eta^2)$
<ul>
<li>Create the state<br />
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \sqrt{|T_k|}||x(0)||\ket{0} +||X_k||_F \ket{1}\Big) \ket{0}\ket{0}</script></li>
<li>Apply to the first two register the unitary that maps:
<script type="math/tex">\ket{0}\ket{0} \mapsto \ket{0} \frac{1}{\sqrt{|T_k|}} \sum_{i \in T_k} \ket{i}\; \mbox{ and } \; \ket{1}\ket{0} \mapsto \ket{1} \frac{1}{||X_k||_F} \sum_{i \in T_k} ||x(i)|| \ket{i}</script>
This will get you to:
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} + \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i} \Big) \ket{0}</script></li>
<li>Now apply the unitary that maps
<script type="math/tex">\ket{0} \ket{i} \ket{0} \mapsto \ket{0} \ket{i} \ket{x(0)} \; \mbox{ and } \; \ket{1} \ket{i} \ket{0} \mapsto \ket{1} \ket{i} \ket{x(i)}</script></li>
</ul>
<p>to get the state
<script type="math/tex">\frac{1}{\sqrt{N_k}} \Big( \ket{0} \sum_{i \in T_k} ||x(0)|| \ket{i} \ket{x(0)}+ \ket{1} \sum_{i \in T_k} ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></p>
<ul>
<li>Apply a Hadamard to the first register to get
<script type="math/tex">\frac{1}{\sqrt{2N_k}}\ket{0} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} + ||x(i)|| \ket{i}\ket{x(i)} \Big) +
\frac{1}{\sqrt{2N_k}}\ket{1} \sum_{i \in T_k} \Big( ||x(0)|| \ket{i} \ket{x(0)} - ||x(i)|| \ket{i}\ket{x(i)} \Big)</script></li>
<li>Measure the first register. If the outcome is $\ket{1}$ then $s:=s+1$</li>
</ul>
</li>
<li>Output $\frac{s}{r}$.</li>
</ul>
<p>Eventually, if you want to get a quadratic speedup w.r.t. $\eta$, perform amplitude estimation (with $O(1/\eta)$ iterations) on register $\ket{1}$ with the unitary implementing steps 1 to 4 to get an estimate $D$ within error $\eta$. This would make the circuit more complex, therefore less suitable for NISQ devices, but if you have enough qubits/fault tolerance, you can add it.</p>
<p>For the analysis, just note that the probability of measuring $\ket{1}$ is:</p>
<script type="math/tex; mode=display">\frac{1}{2N_k} \left ( |T_k|||x(0)||^2 + \sum_{i \in T_k} ||x(i)||^2 - 2\sum_{i \in T_k} \braket{x(0), x(i)} \right) = F_k(x(0)).</script>
<p>By Hoeffding bounds, to estimate $F_k(x(0))$ with error $\eta$ we would need $O(\frac{1}{\eta^2})$ samples.
For the running time, we assume all unitaries are efficient (i.e. we are capable of doing them in polylogarithmic time) either because the quantum states can be prepared directly by some quantum procedure or given that the classical vectors are stored in the QRAM, hence the algorithm runs in time $\tilde{O}(\frac{1}{\eta^2})$. We can of course use amplitude estimation and save a factor of $\eta$. Depending on the application one may prefer to keep the quantum part of the classifier as simple as possible or optimize the running time by performing amplitude estimation.</p>
<p>Given this estimator we can now define the QFD classifier.</p>
<h5 id="require-1">Require</h5>
<ul>
<li>QRAM access to $K$ matrices $X_k$ of elements of different classes.</li>
<li>A test vector $x(0)$.</li>
<li>Error parameter $\eta > 0$.</li>
</ul>
<h5 id="ensure-1">Ensure</h5>
<ul>
<li>A label for $x(0)$.</li>
</ul>
<h5 id="algorithm-1">Algorithm</h5>
<ul>
<li>For $k \in [K]$
<ul>
<li>Use the QFD estimator to find $F_k(x(0))$ on $X_k$ and $x(0)$ with precision $\eta$.</li>
</ul>
</li>
<li>Output $h(x(0))=argmin_{k \in [K]} F_k( x(0))$.</li>
</ul>
<p>The running time of the classifier can be made $\tilde{O}(\frac{K}{\eta})$ when using amplitude amplification. That was it. QFDC basically exploit the subroutine for finding the average sqared distance between a point and a cluster and assign the test point to the “closest” cluster.</p>
<p>Drowbacks of this approach is that is very sentible to outliers. This is because we take the square of the distance of the points belonging to a cluster. This apparently can be mitigated by a proper dimensionality reduction algorithm, like <a href="QSFA">QSFA</a>.</p>scinawaYesterday night there was the TQC dinner in Sydney, I had the change to speak with a very prolific author in QML. While speaking about her work on distance based classification, which is further analyzed here. As a magnificet manifestation of the Zeitgeist in QML, she said that one of the purposes of the paper was to show that an Hadamard gate is enough to perform classification, and you don’t need very complex circuit to exploit quantum mechanics in machine learning. These was exaclty our motivation behind our QFDC classifier as well, so here we are with a little descrption of QFDC! This text is taken straight outta my paper.Iordanis Kerenidis’ talk on quantum machine learning2018-07-02T00:00:00+02:002018-07-02T00:00:00+02:00https://luongo.pro/2018/07/02/Iordanis-talk<p>This is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our <a href="https://arxiv.org/abs/1805.08837">recent results</a> on quantum slow feature analysis and classification of the MNSIT dataset.</p>
<p><a href="http://www.youtube.com/watch?v=KTVtMKo3g80" title="Quantum Algorithms for Classification"><img src="http://img.youtube.com/vi/KTVtMKo3g80/0.jpg" alt="Quantum Algorithms for Classification" /></a></p>scinawaThis is the link to the video of Iordanis (my supervisor) talking about quantum machine learning. In the second half of the video he is describing our recent results on quantum slow feature analysis and classification of the MNSIT dataset.Quantum Slow Feature Analysis, a quantum algorithm for dimensionality reduction2018-06-16T00:00:00+02:002018-06-16T00:00:00+02:00https://luongo.pro/2018/06/16/quantum_slow_feature_analysis_a_quantum_algorithm_for_dimensionality_reduction<p>The original Slow Feature Analysis (SFA) was originally proposed to
learn slowly varying features from generic input signals that vary
rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999).
Computational neurologists observed long time ago that primary sensory
receptors, like the retinal receptors in an animal’s eye - are sensitive
to very small changes in the environment and thus vary on a very fast
time scale, the internal representation of the environment in the brain
varies on a much slower time scale. This observation is called <em>temporal
slowness principle</em>. SFA, being the state-of-the-art model for how this
temporal slowness principle is implemented, is an hypothesis for the
functional organization of the visual cortex (and possibly other sensory
areas of the brain). Said in a very practical way, we have some
“process” in our brain that behaves very similarly as dictated by SFA
(L. Wiskott et al. 2011).</p>
<p>Very beautifully, it is possible to show two reductions from two other
dimensionality reduction algorithms used in machine learning: Laplacian
Eigenmaps (a dimensionality reduction algorithm mostly suited for video
compressing) and Fisher Discriminant Analysis (a standard dimensionality
reduction algorithm). SFA can be applied in ML fruitfully, as there have
been many applications of the algorithm to solve ML related tasks. The
key concept for SFA (and LDA) is that he tries to project the data in
the subspace such that the distance between points with the same label
is minimized, while the distance between points with different label is
maximized.</p>
<h1 id="classical-sfa-for-classification">Classical SFA for classification</h1>
<p>The high level idea of using SFA for classification is the following:
One can think of the training set as an input series
$x(i) \in \mathbb{R}^d , i \in [n]$. Each $x(i)$ belongs to one of $K$
different classes. The goal is to learn $K-1$ functions
$g_j( x(i)), j \in [K-1]$ such that the output
$ y(i) = [g_1( x(i)), \cdots , g_{K-1}( x(i)) ]$ is very similar for
the training samples of the same class and largely different for samples
of different classes. Once these functions are learned, they are used to
map the training set in a low dimensional vector space. When a new data
point arrive, it is mapped to the same vector space, where
classification can be done with higher accuracy.</p>
<p>Now we introduce the minimization problem in its most general form as it
is commonly stated for classification (P. Berkes 2005). Let
$a=\sum_{k=1}^{K} \binom{|T_k|}{2}.$ For all $j \in [K-1]$, minimize:</p>
<script type="math/tex; mode=display">% <![CDATA[
\Delta(y_j) = \frac{1}{a} \sum_{k=1}^K \sum_{s,t \in T_k \atop s<t} \left( g_j( x(s)) - g_j( x(t)) \right)^2 %]]></script>
<p>with the following constraints:</p>
<ol>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i\in T_k} g_j( x(i)) = 0 $</p>
</li>
<li>
<p>$\frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))^2 = 1 $</p>
</li>
<li>
<p>$ \frac{1}{n} \sum_{k=1}^{K}\sum_{i \in T_k} g_j( x(i))g_v( x(i)) = 0 \quad \forall v < j $</p>
</li>
</ol>
<p>For some beautiful theoretical reasons, QSFA algorithm is in practice an
algorithm for fidning the solution of the <em>generalized eigenvalue
problem</em>:</p>
<script type="math/tex; mode=display">AW= \Lambda BW</script>
<p>Here $W$ is the matrix of the singular vectors, $\Lambda$ the diagonal matrix of singular values. For SFA $A$ and $B$ are defined as: $ A=\dot{X}^T \dot{X} $ and $B := X^TX$, where $\dot{X}$ is the matrix of the derivative of the data: i.e. for each possible elements with the same label we calculate the pointwise difference between vectors. (computationally, it suffice to sample $O(n)$ tuples fom the uniform distribution of all possible derivatives.</p>
<p>It is possible to see that the slow feature space we are looking for is is spanned by the eigenvectors of $W$ associated to the $K-1$ smallest eigenvalues of
$\Lambda$.</p>
<h1 id="quantum-sfa">Quantum SFA</h1>
<p>In (Kerenidis and Luongo 2018) we show how, using a “QuantumBLAS” ( i.e.
a set of quantum algorithm that we can use to perform linear algebraic
operations), we can perform the following algorithms. The intuition
behind this algorithm is that the derivative matrix of the data can be
pre-computed on non-whitened data, like one might do classically (and
spare a matrix multiplication). Since with quantum computer we don’t
have this problem, since we know how to perform matrix multiplication
efficiently. As in the classical algorithm, we have to do some
preprocessing to our data. For the quantum case, preprocessing consist
in:</p>
<ol>
<li>
<p>Polynomially expand the data with a polynomial of degree 2 or 3</p>
</li>
<li>
<p>Normalize and Scale the rows of the dataset $X$.</p>
</li>
<li>
<p>Create $\dot{X}$ by sampling from the distribution of possible
couples of rows of $X$ with the same label.</p>
</li>
<li>
<p>Create QRAM for $X$ and $\dot{X}$</p>
</li>
</ol>
<p>Note that all these operation are at most $O(nd\log(nd))$ in the size of
the training set, which is a time that we need to spend anyhow, even by
collecting the data classically.</p>
<p>To use our algorithm for classification, you use QSFA to bring one
cluster at the time, along with the new test point in the slow feature
space, and perform any distance based classification algorithm, like
QFDC or swap tests, and so on. The quantum algorithm is the following:</p>
<ul>
<li>
<p><strong>Require</strong> Matrices $X \in \mathbb{R}^{n \times d}$ and
$\dot{X} \in \mathbb{R}^{n \times d}$ in QRAM, parameters
$\epsilon, \theta,\delta,\eta >0$.\</p>
</li>
<li>
<p><strong>Ensure</strong> A state $\ket{\bar{Y}}$ such that
$ | \ket{Y} - \ket{\bar{Y}} | \leq \epsilon$, with
<script type="math/tex">Y = A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z</script></p>
</li>
</ul>
<ol>
<li>
<p>Create the state
<script type="math/tex">\ket{X} := \frac{1}{ {||X ||}_F} \sum_{i=1}^{n} {||x(i) ||} \ket{i}\ket{x(i)}</script>
using the QRAM that stores the dataset.</p>
</li>
<li>
<p>(Whitening algorithm) Map $\ket{X}$ to $\ket{\bar{Z}}$ with
$| \ket{\bar{Z}} - \ket{Z} | \leq \epsilon $ and $Z=XB^{-1/2}.$
using quantum access to the QRAM.</p>
</li>
<li>
<p>(Projection in slow feature space) Project $\ket{\bar{Z}}$ onto the
slow eigenspace of $A$ using threshold $\theta$ and precision
$\delta$ (i.e.
<script type="math/tex">A^+_{\leq \theta, \delta}A_{\leq \theta, \delta}\bar{Z}</script> )</p>
</li>
<li>
<p>Perform amplitude amplification and estimation on the register
$\ket{0}$ with the unitary $U$ implementing steps 1 to 3, to obtain
$\ket{\bar{Y}}$ with $| \ket{\bar{Y}} - \ket{Y} | \leq \epsilon $
and an estimator $ \bar{ {|| Y ||} } $ with multiplicative error
$\eta$.</p>
</li>
</ol>
<p>Overall, the algorithm is subsumed in the following Theorem.</p>
<p>Let $X = \sum_i \sigma_i u_iv_i^T \in \mathbb{R}^{n\times d}$ and its
derivative matrix $\dot{X} \in \mathbb{R}^{n \log n \times d}$ stored in
QRAM. Let $\epsilon, \theta, \delta, \eta >0$. There exists a quantum
algorithm that produces as output a state <script type="math/tex">\ket{\bar{Y}}</script> with
<script type="math/tex">| \ket{\bar{Y}} - \ket{A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} Z} | \leq \epsilon</script>
in time
<script type="math/tex">\tilde{O}\left( \left( \kappa(X)\mu(X)\log (1/\varepsilon) + \frac{ ( \mu({X})+ \mu(\dot{X}) ) }{\delta\theta} \right)
\frac{||{Z}||}{ ||A^+_{\leq \theta, \delta}A_{\leq \theta, \delta} {Z} ||} \right)</script>
and an estimator $\bar{||Y ||}$ with
$ | \bar{||Y ||} - ||Y || | \leq \eta {||Y ||}$ with an additional
<script type="math/tex">1/\eta</script> factor.</p>
<p>A prominent advantage of SFA compared to other algorithms is that <em>it
is almost hyperparameter-free</em>. The only parameters to chose are in the
preprocessing of the data, e.g. the initial PCA dimension and the
nonlinear expansion that consists of a choice of a polynomial of
(usually low) degree $p$. Another advantage is that it is <em>guaranteed to
find the optimal solution</em> within the considered function space
(Escalante-B and Wiskott 2012). We made an experiment, and using QSFA with a quantum classifier, we were
able to reach 98.5% accuracy in doing digit recognition: we were able to
read 98.5% among 10.000 images of digits given a training set of 60.000
digits.</p>
<h3 id="references">References</h3>
<div id="refs" class="references">
<div id="ref-Berkes2005pattern">
Berkes, Pietro. 2005. “Pattern Recognition with Slow Feature Analysis.”
*Cognitive Sciences EPrint Archive (CogPrints)* 4104.
[http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/\~berkes](http://cogprints.org/4104/ http://itb.biologie.hu-berlin.de/~berkes).
</div>
<div id="ref-escalante2012slow">
Escalante-B, Alberto N, and Laurenz Wiskott. 2012. “Slow Feature
Analysis: Perspectives for Technical Applications of a Versatile
Learning Algorithm.” *KI-Künstliche Intelligenz* 26 (4). Springer:
341–48.
</div>
<div id="ref-jkereLuongo2018">
Kerenidis, Iordanis, and Alessandro Luongo. 2018. “Quantum
Classification of the Mnist Dataset via Slow Feature Analysis.” *arXiv
Preprint arXiv:1805.08837*.
</div>
<div id="ref-scholarpedia2017SFA">
Wiskott, L., P. Berkes, M. Franzius, H. Sprekeler, and N. Wilbert. 2011.
“Slow Feature Analysis.” *Scholarpedia* 6 (4): 5282.
doi:[10.4249/scholarpedia.5282](https://doi.org/10.4249/scholarpedia.5282).
</div>
<div id="ref-wiskott1999learning">
Wiskott Laurenz, and Laurenz Wiskott. 1999. “Learning invariance
manifolds.” *Neurocomputing* 26-27. Elsevier: 925–32.
doi:[10.1016/S0925-2312(99)00011-9](https://doi.org/10.1016/S0925-2312(99)00011-9).
</div>
</div>["scinawa"]The original Slow Feature Analysis (SFA) was originally proposed to learn slowly varying features from generic input signals that vary rapidly over time (P. Berkes 2005; Wiskott Laurenz and Wiskott 1999). Computational neurologists observed long time ago that primary sensory receptors, like the retinal receptors in an animal’s eye - are sensitive to very small changes in the environment and thus vary on a very fast time scale, the internal representation of the environment in the brain varies on a much slower time scale. This observation is called temporal slowness principle. SFA, being the state-of-the-art model for how this temporal slowness principle is implemented, is an hypothesis for the functional organization of the visual cortex (and possibly other sensory areas of the brain). Said in a very practical way, we have some “process” in our brain that behaves very similarly as dictated by SFA (L. Wiskott et al. 2011).How to evaluate a classifier2018-06-10T00:00:00+02:002018-06-10T00:00:00+02:00https://luongo.pro/2018/06/10/evaluate_classifier<p>Practitioners in quantum machine learning should not only build their
skills in quantum algorithms, and having some basic notions of
statistics and data science won’t hurt. In the following the see some
ways to evaluate a classifier. What does it means in practice? Imagine
you have a medical test that is able to tell if a patient is sick or
not. You might want to consider the behavior of your classier with
respect to the following parameters: the cost of identifying a sick
patient as healthy is high, and the cost of identifying a healthy
patient as sick. For example, if the patient is a zombie and it
contaminates all the rest of the humanity you want to minimize the
occurrences of the first case, while if the cure for “zombiness” is
lethal for a human patient, you want to minimize the occurrences of the
second case. With P and N we count the number of patients tested
Positively or Negatively. This is formalized in the following
definitions, which consists in statistics to be calculated on the test
set of a data analysis.</p>
<ul>
<li>
<p><strong>TP True positives (statistical power)</strong> : are those labeled as
sick that are actually sick.</p>
</li>
<li>
<p><strong>FP False positives (type I error)</strong>: are those labeled as sick but
that actually are healthy</p>
</li>
<li>
<p><strong>FN False negatives (type II error)</strong> : are those labeled as
healthy but that are actually sick.</p>
</li>
<li>
<p><strong>TN True negative</strong>: are those labeled as healthy that are healthy.</p>
</li>
</ul>
<p>Given this simple intuition, we can take a binary classifier and imagine
to do an experiment over a data set. Then we can measure:</p>
<ul>
<li>
<p><strong>True Positive Rate (TPR) = Recall = Sensitivity</strong>: is the ratio of
correctly identified elements among all the elements identified as
sick. It answer the question: “how are we good at detecting sick
people?”.
<script type="math/tex">\frac{ TP }{ TP + FN} + \frac{TP }{P} \simeq P(test=1|sick=1)</script>
This is an estimator of the probability of a positive test given a
sick individual.</p>
</li>
<li>
<p><strong>True Negative Rate (TNR) = Specificity</strong> is a measure that tells
you how many are labeled as healthy but that are actually sick.
<script type="math/tex">\frac{ TN }{ TN + FP} = p(test = 0 | sick =0)</script> How many
healthy patients will test negatively to the test? How are we good
at avoiding false alarms?</p>
</li>
<li>
<p><strong>False Positive Rate = Fallout</strong>
<script type="math/tex">FPR = \frac{ FP }{ FP + TN } = 1 - TNR</script></p>
</li>
<li>
<p><strong>False Negative Rate = Miss Rate</strong>
<script type="math/tex">FNR = \frac{ FN }{ FN + TP } = 1 - TPR</script></p>
</li>
<li>
<p><strong>Precision, Positive Predictive Value (PPV)</strong>:
<script type="math/tex">\frac{ TP }{ TP + FP} \simeq p(sick=1 | positive=1)</script> How many
positive to the test are actually sick?</p>
</li>
<li>
<p><strong>$F_1$ score</strong> is a more compressed index of performance which is a
possible measure of performance of a binary classifier. Is simply
the harmonic mean of Precision and Sensitivity:
<script type="math/tex">F_1 = 2\frac{Precision \times Sensitivity }{Precision + Sensitivity }</script></p>
</li>
<li>
<p><strong>Receiver Operating Characteristic (ROC)</strong> Evaluate the TRP and FPR
at all the scores returned by a classifier by changing a parameter.
It is a plot of the true positive rate against the false positive
rate for the different possible value (cutpoints) of a test or
experiment.</p>
</li>
<li>
<p>The <strong>confusion matrix</strong> generalize these 4 combination of (TP TN FP
FN) to multiple classes: is a $l \times l$ where at row $i$ and
column $j$ you have the number of elements from the class$i$ that
have been classified as elements of class $j$.</p>
</li>
</ul>
<p>Bref. This post because I always forgot about these terms and I wasn’t
able to find them described in a concise way with the same formalism
without googling more time than that I spent writing this post. Other
links:
<a href="https://uberpython.wordpress.com/2012/01/01/precision-recall-sensitivity-and-specificity/">here</a></p>["Alessandro Luongo"]Practitioners in quantum machine learning should not only build their skills in quantum algorithms, and having some basic notions of statistics and data science won’t hurt. In the following the see some ways to evaluate a classifier. What does it means in practice? Imagine you have a medical test that is able to tell if a patient is sick or not. You might want to consider the behavior of your classier with respect to the following parameters: the cost of identifying a sick patient as healthy is high, and the cost of identifying a healthy patient as sick. For example, if the patient is a zombie and it contaminates all the rest of the humanity you want to minimize the occurrences of the first case, while if the cure for “zombiness” is lethal for a human patient, you want to minimize the occurrences of the second case. With P and N we count the number of patients tested Positively or Negatively. This is formalized in the following definitions, which consists in statistics to be calculated on the test set of a data analysis.Gather Statistics For Your Qram2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Gather-statistics-for-your-QRAM<p>Generally, with the term QRAM people are referring to an oracle, or
generically to a unitary, that gets called with the purpose of creating
a state in a quantum circuit. This state represents some (classical)
data that you want to process later in your algorithm. More formally,
QRAM allows you to perform operations like:
$\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some
$i \in [n]$. This model can be used to create states proportional to
classical vectors, and allowing us to perform queries:
$\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for
some $i \in [n]$</p>
<p>Querying the QRAM is assumed to be done efficiently. The running time is
expected to be polylogarithmic in the matrix dimensions, but eventually
the time complexity might polynomial in other parameters. As an example,
in QRAM described in Kerenidis and Prakash (2017)Kerenidis and Prakash
(2016)Prakash (2014) the authors stores a matrix decomposition such that
the running time of a query might depend on the Frobenius norm, or a
parametrized function, which is specific to their implementation. In
this model, the best parametrization of the decomposition might depend
on the dataset. This means that in practice, you might need to estimate
these parameters, and therefore I’ve decided to write a library for
this. Specifically, given a matrix $A$ to store in QRAM, you have to
find the value $p \in \left(0, 1 \right)$ such that it minimize the
function: <script type="math/tex">\mu_p(A) = \sqrt{ s_{2p}(A) s_{2(1-p)}(A^T)}</script> where
$s_p(A) := max_{i \in [m]} |A|_F^p $ is the maximum $l_p$ norm to the
power of $p$ of the row vectors.</p>
<p>Being able to estimate parameters of a dataset might happen also with
other model of access to the data. For instance, other algorithms such
HHL uses Hamiltonian simulation, which has an access model that makes
the complexity of the algorithm depend on the sparsity.</p>
<p>So far qramutils analyze a given numpy matrix for the following
parameters:</p>
<ul>
<li>
<p>The sparsity.</p>
</li>
<li>
<p>The conditioning number.</p>
</li>
<li>
<p>The Frobenius norm (of the rescaled matrix such that
$0< \sigma_i < 1$).</p>
</li>
<li>
<p>The best parameter $p$ for the matrix decomposition described above.</p>
</li>
<li>
<p>Some boring and common plotting.</p>
</li>
</ul>
<p><a href="https://github.com/Scinawa/qramutils">Here</a> you can find the
repository.</p>
<p>This code might be improved in many directions! For instance, I’d like
to integrate in the library the code for plotting the parameters for
various PCA dimensions and/or degree of polynomial expansion, integrate
options for dataset normalization, scaling, and maybe expand the type of
accepted input data, and so on..</p>
<p>Ideally, for other kind of matrices there hopefully might be other kind
matrix decompositions available and therefore there might be the need to
estimate other parameters in the future. This is where I’ll add that
code for that. :)</p>
<p>This is an example of usage on the MNIST dataset:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ pipenv run python3 examples/mnist_QRAM.py --help
usage: mnist_QRAM.py [-h] [--db DB] [--generateplot] [--analize]
[--pca-dim PCADIM] [--polyexp POLYEXP]
[--loglevel {DEBUG,INFO}]
Analyze a dataset and model QRAM parameters
optional arguments:
-h, --help show this help message and exit
--db DB path of the mnist database
--generateplot run experiment with various dimension
--analize Run all the analysis of the matrix
--pca-dim PCADIM pca dimension
--polyexp POLYEXP degree of polynomial expansion
--loglevel {DEBUG,INFO}
set log level
</code></pre>
</div>
<p>This is the output, assuming you have a folder called data that holds
the MNIST dataset.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 examples/mnist_QRAM.py --db data --analize --loglevel INFO
04-01 22:23 INFO Calculating parameters for default configuration: PCA dim 39, polyexp 2
04-01 22:24 INFO Matrix dimension (60000, 819)
04-01 22:24 INFO Sparsity (0=dense 1=empty): 0.0
04-01 22:24 INFO The Frobenius norm: 4.6413604982930385
04-01 22:26 INFO best p 0.8501000000000001
04-01 22:26 INFO Best p value: 0.8501000000000001
04-01 22:26 INFO The \mu value is: 4.6413604982930385
04-01 22:26 INFO Qubits needed to index+data register: 26.
</code></pre>
</div>
<p>If you want to use the library in your source code:</p>
<div class="highlighter-rouge"><pre class="highlight"><code> libq = qramutils.QramUtils(X, logging_handler=logging)
logging.info("Matrix dimension {}".format(X.shape))
sparsity = libq.sparsity()
logging.info("Sparsity (0=dense 1=empty): {}".format(sparsity))
frob_norm = libq.frobenius()
logging.info("The Frobenius norm: {}".format(frob_norm))
best_p, min_sqrt_p = libq.find_p()
logging.info("Best p value: {}".format(best_p))
logging.info("The \\mu value is: {}".format(min(frob_norm, min_sqrt_p)))
qubits_used = libq.find_qubits()
logging.info("Qubits needed to index+data register: {} ".format(qubits_used))
</code></pre>
</div>
<p>To install, you just need to do the following:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv run python3 setup.py sdist
</code></pre>
</div>
<p>And then, your package will be ready to be installed as:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>pipenv install dist/qramutils-0.1.0.tar.gz
</code></pre>
</div>
<div id="refs" class="references">
<div id="ref-kerenidis2016quantum">
Kerenidis, Iordanis, and Anupam Prakash. 2016. “Quantum Recommendation
Systems.” *ArXiv Preprint ArXiv:1603.08675*.
</div>
<div id="ref-kerenidis2017quantum">
———. 2017. “Quantum Gradient Descent for Linear Systems and Least
Squares.” *ArXiv Preprint ArXiv:1704.04992*.
</div>
<div id="ref-prakash2014quantum">
Prakash, Anupam. 2014. *Quantum Algorithms for Linear Algebra and
Machine Learning*. University of California, Berkeley.
</div>
</div>scinawaGenerally, with the term QRAM people are referring to an oracle, or generically to a unitary, that gets called with the purpose of creating a state in a quantum circuit. This state represents some (classical) data that you want to process later in your algorithm. More formally, QRAM allows you to perform operations like: $\ket{i}\ket{0} \to \ket{i}\ket{x_i}$ for $x_i \in \mathbb{R}$ for some $i \in [n]$. This model can be used to create states proportional to classical vectors, and allowing us to perform queries: $\ket{i}\ket{0} \to \ket{i}\ket{x(i)}$ for $x(i) \in \mathbb{R}^d$ for some $i \in [n]$Failed Attempt To Reverse Swap Test2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00https://luongo.pro/2018/04/15/Failed-attempt-to-reverse-swap-test<p>This post has born from an attempt of finding a reversible circuit for
computing the swap test: a circuit used to compute the inner product of
two quantum states. This circuit was originally proposed for solving the
state distinguishably problem, but as you can imagine is very used in
quantum machine learning too. Before starting, let’s note one thing. A
reversible circuit for the swap test implies that we are able to
recreate the two input states. Conceptually, this should be impossible,
because of the no cloning theorem. With a very neat observation we can
realize that we are not even able to preserve one of the states.</p>
<p>There is no unitary operator $U\ket{x}\ket{y}$ that allows you to
estimate the scalar product between two states $x,y$ as $\braket{x|y}$
using only one copy of $\ket{x}$.</p>
<p>By absurd. Assume this unitary exists. Than it would be possible to
estimate the scalar product between $\ket{x}$ and all the base states
$\ket{i}$. (basically doing tomography for the state). This is a way of
recover classically the state of $\ket{x}$. By knowing $\ket{x}$, we
could recreate as many copies as we want of $\ket{x}$. Therefore, we
could use this procedure to clone a state. This is prevented by the
no-cloning theorem.</p>
<p>Let’s see what happens if we try to reverse it.</p>
<p><img src="/assets/reverse_swap.png" alt="image" /></p>
<p>It is good to know that the circuit in Figure [conservative] is
inspired by the proof $BPP \subseteq BQP$. The idea is the following: if
after a swap test, and before doing any measurement on the ancilla
qubit, we do a CNOT on a second ancillary qubit, and then execute the
inverse of the swap test. Being the swap test self-inverse operator, it
simply means that we apply the swap test twice. Let’s start the
calculations from the CNOT on the second ancilla qubit.</p>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{00} + \left( \ket{ab} - \ket{ba} \right)\ket{11} \Big] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \Big[ \left( \ket{ab} + \ket{ba} \right)\ket{+0} + \left( \ket{ab} - \ket{ba} \right)\ket{-1} \Big] \xrightarrow{\text{SWAP}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} + \ket{ba} \Big) \ket{0} + \Big( \ket{ab} + \ket{ba} \Big) \ket{1} \Big] \ket{0} + \frac{1}{\sqrt{2}} \Big[ \Big( \ket{ab} - \ket{ba} \Big) \ket{0} - \Big( \ket{ba} - \ket{ab} \Big) \ket{1} \Big] \ket{1}
\right] =</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{+} \right] \ket{0} +
\left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{+} \right] \ket{1}
\right] \xrightarrow{\text{H}}</script>
<script type="math/tex; mode=display">\frac{1}{2} \left[
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} + \ket{ba} \right)\ket{0} \right] \ket{0} +
\frac{1}{\sqrt{2}} \left[ 2\left( \ket{ab} - \ket{ba} \right)\ket{0} \right] \ket{1}
\right].</script>
<p><script type="math/tex">p(\ket{0}) = \frac{1}{4}\Big( 2 + 2 |\braket{ab|ba}|\Big) = \frac{ 1+ \braket{ab|ba}}{2} = \frac{ 1+ |\braket{a|b}|^2}{2}</script>
And therefore $p(\ket{1})$ is $\frac{ 1- |\braket{a|b}|^2}{2}$ as in the
original swap test. So, the result is the same, but as in the original
swap test, the register are pretty entangled, therefore we haven’t
reversed our swap.</p>
<p>Here I have applied the rules:</p>
<ul>
<li>
<p>$ (A\otimes B)^{\dagger} = A^{\dagger} \otimes B^{\dagger} $</p>
</li>
<li>
<p>$ \left( \bra{\phi} \otimes \bra{\psi} \right) \left( \ket{\phi} \otimes \ket{\psi} \right) = \braket{\psi, \psi} \otimes \braket{\phi, \phi}$</p>
</li>
</ul>
<p>You may have noted that this circuit is very similar to circuit that you
obtain if you perform amplitude amplification Brassard et al. (2000) on
the swap test. The swap circuit is the algorithm $A$ that produces
states with a certain probability distribution, and the CNOT is the
unitary $U_f$ that is able to recognize the “good” states from bad
states. By setting the second ancilla qubit to $\ket{+}$ we would be
able to write on the phase of our state some useful information to
recover with a QFT later on. That’s very cool, since amplitude
amplification allows us to decrease quadratically the computational
complexity of the algorithm with respect to the error in the estimation
of the amplitude of the ancilla qubit.</p>
<div id="refs" class="references">
<div id="ref-brassard2002quantum">
Brassard, Gilles, Peter Høyer, Michele Mosca, and Alain Tapp. 2000.
“Quantum Amplitude Amplification and Estimation.” *ArXiv Preprint
Quant-Ph/0005055*.
</div>
</div>scinawaThis post has born from an attempt of finding a reversible circuit for computing the swap test: a circuit used to compute the inner product of two quantum states. This circuit was originally proposed for solving the state distinguishably problem, but as you can imagine is very used in quantum machine learning too. Before starting, let’s note one thing. A reversible circuit for the swap test implies that we are able to recreate the two input states. Conceptually, this should be impossible, because of the no cloning theorem. With a very neat observation we can realize that we are not even able to preserve one of the states.Hamiltonian Simulation2018-02-18T00:00:00+01:002018-02-18T00:00:00+01:00https://luongo.pro/2018/02/18/Hamiltonian-simulation<p>These are my notes are on Childs (n.d.).</p>
<h1 id="introduction">Introduction</h1>
<p>The only way possible to start a chapter on Hamiltonian simulation would
be to start from the work of Feynman, who had the first intuition on the
power of quantum mechanics for simulating physics with computers. We
know that the Hamiltonian dynamics of a closed quantum system, weather
its evolution changes with time or not, is give by the
Schr<span>ö</span>dinger equation:</p>
<script type="math/tex; mode=display">i\hbar \frac{d}{dt}\ket{\psi(t)} = H(t)\ket{\psi(t)}</script>
<p>Given the initial conditions of the system (i.e. $\ket{\psi(0)} $ ) is
it possible to know the state of the system at time
$t: \ket{\psi(t)} = e^{-i (H_1t/m)}\ket{\psi(0)}$.</p>
<p>As you can imagine, classical computers are suppose to struggle
simulating the system to get $ \ket{\psi(t)}$, since this equation
describes the dynamics of any quantum system, and we don’t think (hope
:D ) classical computer can simulate that efficiently. But we know that
quantum computers can help “copying” the dynamic of another quantum
system. Why would you be bothered?</p>
<p>Imagine you are a quantum machine learning scientist, and you have just
found a new mapping between an optimization problem and an Hamiltonian
dynamics, and you want to use quantum computer to perform the
optimization Otterbach et al. (2017). You expect a quantum computers to
run the Hamiltonian simulation for you, and then sample useful
information from the resulting quantum sate. This result might be fed
again into your classical algorithm to perform ML related task, in a
virtuous cycle of hybrid quantum-classical computation.</p>
<p>Or imagine you that you are a chemist, and you have developed an
hypothesis for the Hamiltonian dynamics of a chemical compound. Now you
want to run some experiments to see if the formula behaves according to
the experiments. Or maybe you are testing properties of complex
compounds you don’t want to synthesize. We can formulate the problem of
HS in this way:</p>
<p><span>Hamiltonian simulation problem</span>: Given a state
$\ket{\psi(0)}$ and an Hamiltonian $H$, obtain a state $\ket{\psi(t)}$
such that $\ket{\psi(t)}:=e^{-iHt}\ket{\psi(0)}$ and
$|\ket{\psi(0)} - \ket{\tilde{\psi(t)}}| < \varepsilon$ for some norm
(usually trace norm).</p>
<p>Which leads us to the definition of efficiently simulable Hamiltonian:</p>
<p><span>Efficient Hamiltonian simulation</span> Given a state
$\ket{\psi(0)}$ and an Hamiltonian $H$ acting on $n$ qubits, we say $H$
can efficiently simulated if,
$\forall t \geq 0, \forall \varepsilon \geq 0$, there is a quantum
circuit such $U$ that $||U - e^{-iHt} || < \varepsilon$ using a number
of gates that is polynomial in $n,t, 1/\varepsilon$.</p>
<p>In the following, we suppose to have a quantum computer and quantum
access to the Hamiltonian $H$. Te importance of this problem might not
be immediately clear to a computer scientist. But if we think that every
quantum circuit is described by an Hamiltonian dynamic, being able to
simulate an Hamiltonian is like being able to have virtual machines in
our computer. (This example actually came from a talk at IHP of Toby
Cubitt!) Remember that there’s a theorem that says that for an
Hamiltonian simulation problem, the number of gates is $\omega{t}$, and
this Theorem goes under the name of No fast-forward Theorem. <br>
But concretely? What does it means to simulate an Hamiltonian of a
physical system? Let’s take the Hamiltonian of a particle in a
potential: <script type="math/tex">H = \frac{p^2}{2m} + V(x)</script> We want to know the position of
the particle at time $t$ and therefore we have to compute
$e^{-iHt}\ket{\psi(0)}$</p>
<h2 id="some-hamiltonians-we-know-to-simulate-efficiently">Some Hamiltonians we know to simulate efficiently</h2>
<ul>
<li>
<p>Hamiltonians that represent the dynamic of a quantum circuits (more
formally, where you only admit local interactions between a constant
number of qubits). This result is due to the famous
Solovay-Kitaev Theorem. That says that there exist an efficient
compiler from an architecture that use a set of gates $\mathbb{S_1}$
and another quantum computer that uses a set of universal gates
$\mathbb{S_2}$.</p>
</li>
<li>
<p>If the Hamiltonian can be efficiently applied for a basis, then also
$UHU$ can be efficiently applied. Proof:
$e^{-iUHU^\dagger t} = Ue{-iH t}U^\dagger $.</p>
</li>
<li>
<p>If $H$ is diagonal in the computational basis and we can compute
efficiently $\braket{a||H|a}$ for a basis element $a$. By linearity:
<script type="math/tex">\ket{a,0} \to \ket{a, d(a)} \to e^{-itd(a)} \otimes I \ket{a,d(a)t} \to e^{-itd(a)}\ket{a,0} = e^{-itH}\ket{a,0}</script></p>
<p>(In general: if we know how to calculate the eigenvalues, we can
apply an Hamiltonian efficiently.)</p>
</li>
<li>
<p>The sum of two efficiently simulable Hamiltonians is efficiently
simulable using Lie product formula
<script type="math/tex">e^{-i (H_1 + H_2) t} = lim_{m \to \infty} ( e^{-i (H_1t/m)} + e^{-i (H_2t/m) t} )^m</script>
We chose $m$ such that
<script type="math/tex">|| e^{-i (H_1 + H_2) t} - ( e^{-i (H_1t/m)} + e^{-i (H_2t/m) t} )^m || \leq</script>
and this gives $m=(vt^2/\varepsilon)$ and
$v=\max{ ||H_1||, ||H_2||}$. Using higher order approximation is
possible to reduce the dependency on $t$ to $O(t^1+\delta)$ for a
chosen $\delta$. (wtf!)</p>
</li>
<li>
<p>This facts can be used to show that the sum of polynomially many
efficiently simulable Hamiltonians is simulable efficiently.</p>
</li>
<li>
<p>The commutator $[H_1, H_2]$ of two efficiently simulable Hamiltonian
can be computed efficiently because:
<script type="math/tex">e^{-i[H_1, H_2]t} = lim_{m\to \infty} (e^{-iH_1\sqrt[]{t/m}}e^{-iH_2\sqrt[]{t/m}}e^{H_1\sqrt[]{t/m}}e^{H_1\sqrt[]{t/m}})^m</script>
which we believe, without having idea on how to check it. :/</p>
</li>
<li>
<p>If the Hamiltonian is sparse, it can be efficiently simulated. The
idea is to pre-compute a edge-coloring of the graph represented by
the adjacency matrix of the sparse Hamiltonian. (For each $H$ you
can consider a graph $G=(V, E)$ such that its adjacency matrix $A$
is $a_{ij}=1$ if $H_{ij} \neq 0$ ).</p>
</li>
</ul>
<p>Recalling the example of a particle in a potential energy: its momentum
<script type="math/tex">\frac{p^2}{2m}</script> is diagonal in the fourier basis (and we know how to
do a QFT), and the potential $V(x)$ is diagonal in the computational
basis, thus this Hamiltonian is easy to simulate.</p>
<p>Exercise/open problem: do we know any algorithm that might benefit the
efficient simulation of $[H_1, H_2]$? Childs in Childs (n.d.) claims he
is not aware of any algorithm that uses that.</p>
<div id="refs" class="references">
<div id="ref-childs">
Childs, Andrew. n.d. “Lecture Notes in Quantum Algorithmics.”
</div>
<div id="ref-otterbach2017unsupervised">
Otterbach, JS, R Manenti, N Alidoust, A Bestwick, M Block, B Bloom, S
Caldwell, et al. 2017. “Unsupervised Machine Learning on a Hybrid
Quantum Computer.” *ArXiv Preprint ArXiv:1712.05771*.
</div>
</div>scinawaThese are my notes are on Childs (n.d.).Storing Data In A Quantum Computer2018-02-03T00:00:00+01:002018-02-03T00:00:00+01:00https://luongo.pro/2018/02/03/Storing-data-in-a-quantum-computer<p>We are going to see what does it mean to store/represent data on a
quantum computer. Is very important to know how, since knowing what are
the most common ways of encoding data in a quantum computer might pave
the way for the intuition in solving new problems. Let me quote an
article of 2015: Schuld, Sinayskiy, and Petruccione (2015): <em>In order to
use the strengths of quantum mechanics without being confined by
classical ideas of data encoding, finding “genuinely quantum” ways of
representing and extracting information could become vital for the
future of quantum machine learning</em>.</p>
<p>Usually we store information in a classical data structure, and the assume to have quantum access to it.
In general, this quantum access consist of a query: an operation
$U\ket{i}\ket{0}\to \ket{i}\ket{\psi_i}$, where the first is called the
index register, and the second is a target register that holds the
information that you requested. To get an intuition of what the previous
sentence means, I borrow an intuitive example that I stole from a
youtube video of Seth Lloyd. Imagine that you have a source of photons -
which represent your query register - and you send one towards a CD. Due
to the duality wave-particle, you are actually hitting your CD with a
“thing” that is not anymore located deterministically as a single
particle in the space, but behaves as a wave. When the wave hits the
surface of the CD, it gets all the information stored in the little
holes of $0$s and $1$s, and gets reflected carrying on this information.
This wave represent the output of your query. (Sure, we assume the
interaction between the wave and the CD does not make the wave-function
collapse).\
Let’s start. As good computer scientist, let’s organize what we know how
to do by data types.</p>
<h1 id="scalars">Scalars</h1>
<h2 id="integer-mathbbz">Integer: $\mathbb{Z}$</h2>
<p>Let’s start with the most simple “type” of date: the integers. Let
$m \in \mathbb{N}$. We take the binary expansion of $m$, and set the
qubits of our computer as the binary digits of the number. As example,
if your number’s binary expansion is $0100\cdots0111$ we can create the
state:
$\ket{x} = \ket{0}\otimes \ket{1} \ket{0} \ket{0} \cdots \ket{0} \ket{1} \ket{1} \ket{1}$.
Formally, given $m$:</p>
<script type="math/tex; mode=display">\ket{m} = \bigotimes_{i=0}^{n} m_i</script>
<p>Using superposition of states like these we might create things like
$\frac{1}{\sqrt{2}} (\ket{5}+\ket{9})$ or more involved convex
combination of states.\
The time needed to create this state is linear in the number of
bits/qubits. It might be used to get speedup in the number of query to
an oracle, like in (<span class="citeproc-not-found" data-reference-id="Wiebe0QuantumModels"><strong>???</strong></span>), or in general
where you aim at getting a speedup in oracle complexity using amplitude
amplification and similar. For negative integers, we might just use a
qubit more for the sign. (Don’t be tempted into saying that
$\ket{3}+\ket{3}=\ket{6}$. It’s not!)\</p>
<h2 id="rational-mathbbq">Rational: $\mathbb{Q}$</h2>
<p>As far as I know, in quantum computation / quantum machine learning,
there are some register with rational numbers, usually as $n$-bit
approximation of a reals between $0$ and $1$. In that case, just take
the binary expansion and use the previous encoding.</p>
<h2 id="reals-mathbbr">Reals: $\mathbb{R}$</h2>
<p>As before, if the number is between $0$ and $1$, use the previous
encoding. It’s pretty rare to store just a single number in
$\mathbb{R}$, and usually real numbers are encoded into amplitudes and
used when dealing with vectors in $\mathbb{R}^n$.</p>
<h1 id="vectors">Vectors</h1>
<h2 id="binary-vectors-01n">Binary vectors: ${0,1}^n$</h2>
<p>Let $\vec{b} \in {0,1}^n$. As for the encoding used for the integers:</p>
<script type="math/tex; mode=display">\ket{b} = \bigotimes_{i=0}^{n} b_i</script>
<p>As an example, suppose you want to encode the vector
$[1,0,1,0,1,0] \in {0,1}^6$, which is $42$ in decimal. This will
correspond to the $42$-th base of the Hilbert space where our qubits
will evolve. In some sense, we are not fully using the $C^{2^{n}}$
Hilbert space: we are only mapping a binary vector in a (canonical)
base. As a consequence, distances between points in the new space are
different.\
We can imagine some other encodings. For instance we can map a $0$ into
$1$ and $1$ as $-1$ (even if I don’t know how it might be used nor how
to build it).
<script type="math/tex">\ket{v} = \frac{1}{\sqrt{2^n}} \sum_{i \in \{0,1\}^n} (-1)^{b_i} \ket{i}</script>\</p>
<h2 id="real-vectors-mathbbrn">Real vectors: $\mathbb{R}^n$</h2>
<p>Maybe you are used to see Greek letters inside a ket to represent
quantum states, and use latin letters to represent quantum states that
use binary expansion to hold classical data. The following is a very
common encoding in quantum machine learning. For a vector
$\vec{x} \in \mathbb{R}^{2^n}$, we can build:</p>
<script type="math/tex; mode=display">\ket{x} = \frac{1}\sum_{i=0}^{N}\vec{x}_i\ket{i} = |\vec{x}|^{-1}\vec{x}</script>
<p>Note that to span a space of dimension $N=2^n$, you just need $log_2(n)$
qubits: we encode each component of the classical vector in the
amplitudes of a state vector. Ideally, we know from Grover and Rudolph
(2002) how to create quantum states that corresponds to vector of data
(i.e. “efficiently integrable probability distribution”). We miss an
important ingredient. This encoding might not be enough if you have to
manipulate “many” vectors, as in some sense what you are creating is
vector with unitary norm. What if we want to build a superposition of
two vectors? Well, might expect to be able to create a state
$\frac{1}{\sqrt{N}} \sum_{i} \ket{x_i}$, but there’s a problem. Imagine
to do it with just two vectors: $x_1 = [-1, -1, -1]$ and
$x_2 = [1,1,1]$. Well, their (uniform) linear combination is the vector
$[0,0,0]$. What does this means? that to make a unitary vector out of
it, we need a exceptionally small normalizing factor. Usually this kind
of superpositions are obtained as a result of a measurement on an
ancilla qubit. The measurement has a probability that is proportional to
the norm of the vectors. Therefore, to be able to build this state we’re
gonna need an intolerable number of trial and error in building this
state. This problem can be amended by adjoining an ancilla register, as
we see now.</p>
<h1 id="matrices">Matrices</h1>
<p>Imagine to store your vectors in the rows of a matrix. Let
$X \in \mathbb{R}^{n \times d}$, a matrix of $n$ vectors of $d$
components. We will encode them using $log(d)+log(d)$ qubits as the
states:</p>
<script type="math/tex; mode=display">\frac{1}{\sqrt{\sum_{i=0}^n {\left \lVert x(i) \right \rVert}^2 }} \sum_{i=0}^n {\left \lVert x(i) \right \rVert}\ket{i}\ket{x(i)}</script>
<p>Or, put it another way:</p>
<script type="math/tex; mode=display">\frac{1}{\sqrt{\sum_{i=0}^n} {\left \lVert x(i) \right \rVert}^2} \sum_{i,j} X_{ij}\ket{i}\ket{j}</script>
<p>The problem is how to build it this state. We are going to need a very
specific oracle (which we call QRAM, even if there is ambiguity in
literature on that). A QRAM gives us access to two things: the norm of
the rows of a matrix and the rows itself. Calling the two oracles
combined, we can do the following mapping:</p>
<script type="math/tex; mode=display">\sum_{i=0}^{n} \ket{i} \ket{0} \to \sum_{i=0}^n {\left \lVert x(i) \right \rVert}\ket{i}\ket{x(i)}</script>
<p>Basically, we use the superposition in the first register to select the
rows of the matrix that we want, and after the query we have them in the
second register. A QRAM is a tree-like classical data structure that
offer quantum access in an oracular way to a data structure like this.
You can think of a QRAM as a circuit that encodes your matrix. Note that
using this nice encoding, the ratios between the distances between
vectors is the same as in the Hilbert space. Also note that once the
vector is created, the only way to recover $x$ from $\ket{x}$ is to do
quantum tomography (i.e. destroying the state with a measurement). The
cost (in term of time and space) of creating this data structure is a
little bit more than linear: $O(nd log (nd))$ but it pays by giving a
access time for a query that is $O(log(nd))$. (An example of QRAM can be
found in Kerenidis and Prakash (2017), and will obviously covered in
this blog in the next posts. Yes, I know. It might be difficult with the
physical implementation of QRAM, but I have faith the experimental
physicists. :)</p>
<h1 id="graphs">Graphs</h1>
<p>For specific problems we can even change the computational model (i.e.
no more gates on wires used to describe computation). For instance,
given a graph $G=(V,E)$ we can encode it as a state $\ket{G}$ such that:
<script type="math/tex">K_G^V\ket{G} = \ket{G} \forall v \in V</script> where
$K_G^v = X_y\prod_{u \in N}(v)Z_u $, and $X_u$ and $Z_u$ are the Pauli
operators on $u$. The way of picture this encoding is this. Take as many
qubits in state $\ket{+}$ as nodes in the graph, and apply controlled
$Z$ rotation between qubits representing adjacent nodes. There are some
algorithms that use this state as input, for instance in Zhao,
Pérez-Delgado, and Fitzsimons (2016), where they even extended this
definition.\</p>
<h1 id="conclusions">Conclusions</h1>
<p>Te precision that we can use for specifying the amplitude of a quantum
state might be limited in practice by the precision of our quantum
computer in manipulating quantum states (i.e. development in techniques
in quantum metrology and sensing). Techniques that use a certain
precision in the amplitude of a state might suffer of initial technical
limitations of the hardware. As a parallel, think of what’s happening
with CPUs where we had 16, 32 and now 64 bits of precision.\</p>
<div id="refs" class="references">
<div id="ref-Grover2002">
Grover, Lov, and Terry Rudolph. 2002. “Creating superpositions that
correspond to efficiently integrable probability distributions.”
</div>
<div id="ref-kerenidis2017quantum">
Kerenidis, Iordanis, and Anupam Prakash. 2017. “Quantum Gradient Descent
for Linear Systems and Least Squares.” *ArXiv Preprint
ArXiv:1704.04992*.
</div>
<div id="ref-schuld2015introduction">
Schuld, Maria, Ilya Sinayskiy, and Francesco Petruccione. 2015. “An
Introduction to Quantum Machine Learning.” *Contemporary Physics* 56
(2). Taylor & Francis: 172–85.
</div>
<div id="ref-zhao2016fast">
Zhao, Liming, Carlos A Pérez-Delgado, and Joseph F Fitzsimons. 2016.
“Fast Graph Operations in Quantum Computation.” *Physical Review A* 93
(3). APS: 032314.
</div>
</div>scinawaWe are going to see what does it mean to store/represent data on a quantum computer. Is very important to know how, since knowing what are the most common ways of encoding data in a quantum computer might pave the way for the intuition in solving new problems. Let me quote an article of 2015: Schuld, Sinayskiy, and Petruccione (2015): In order to use the strengths of quantum mechanics without being confined by classical ideas of data encoding, finding “genuinely quantum” ways of representing and extracting information could become vital for the future of quantum machine learning.Swap Test For Distances2018-01-29T00:00:00+01:002018-01-29T00:00:00+01:00https://luongo.pro/2018/01/29/Swap-test-for-distances<h1 id="intro-to-swap-test">Intro to swap test</h1>
<p>What is known as <em>swap test</em> is a simple but powerful circuit used to
measure the “proximity” of two quantum states (cosine distance in
machine learning). It consists in a controlled swap operation surrounded
by two Hadamard gates on the controlling qubit. Repeated measurements of
the ancilla qubit allows us to estimate the probability of reading $0$
or $1$, which in turn will allow us to estimate $\braket{\psi|\phi}$.
Let’s see the circuit:</p>
<p><img src="/assets/swap_distances/swap_test.png" alt="image" /></p>
<p>It is simple to check that the state at the end of the execution of the
circuit is the following:</p>
<script type="math/tex; mode=display">\Big[\ket{\psi} \ket{\phi} + \ket{\phi} \ket{\psi} \Big]\ket{0} +\Big[\ket{\psi} \ket{\phi} - \ket{\psi} \ket{\phi} \Big] \ket{1}</script>
<p>Thus, the probability of reading a $0$ in the ancilla qubit is:
<script type="math/tex">P (\ket{0}) = \left( \frac{1+|\braket{\psi|\phi}|^2}{2} \right)</script> And
the probability of reading a $1$ in the ancilla qubit is:
<script type="math/tex">P (\ket{1}) = \left( \frac{1-|\braket{\psi|\phi}|^2}{2} \right)</script></p>
<p>This means that if the two states are completely orthogonal, we will
measure an equal number of zero and ones. On the other side, if
$\ket{\psi} = \ket{\phi}$, then the probability amplitude of reading
$\ket{1}$ in the ancilla qubit is $0$. Repeating this operation a
certain number of time, allows us to estimate the inner product between
$\ket{\psi},\ket{\phi}$. Unfortunately, at each measurement we
irrevocably destroy the states, and we need to recreate them in order to
perform again the swap test. This is not much of a problem, if we have
an efficient way of creating $\ket{\psi}$ and $\ket{\phi}$. We can
informally state what the swap test consist with the following theorem.</p>
<p>[Swap test for inner products] Suppose you have access to unitary
$U_\psi$ and $U_\phi$ that allows you to create $\ket{\psi}$ and
$\ket{\phi}$, each of them requiring time $T(U_\psi)$ and $T(U_\phi)$.
Then, there is a circuit that allows to estimate inner products between
two states $\ket{x},\ket{y}$ in $O(T(U_\psi)T(U_\phi)\varepsilon^{-2})$
number of operations.</p>
<p>The correctnes of the circuit was sown before. This is the analysis of
the running time. We recognize in the measurement on the ancilla qubit a
random variable $X$ with Bernulli distribution with
$p=(1+|\braket{\psi|\phi}|^2)/2$, and variance $p(1-p)$. The number of
repetitions that are necessary to estimate the expected value $\bar{p}$
of $X$ with relative error $\epsilon$ is bounded by the Chernoff bound.</p>
<h1 id="swap-test-for-distance-between-vector-and-center-of-a-cluster">Swap test for distance between vector and center of a cluster</h1>
<p>Now we are going to see how to use the swap test to calculate the
distance between two vectors. This section is entirely based on the work
of Lloyd, Mohseni, and Rebentrost (2013). There, they explain how to use
this subroutine to do cluster assignment and many other interesting
things in quantum machine learning. This was one of the first paper I
read in quantum machine learning, and I really wanted to understand
everything, so I tried to do the calculation myself. I think I have
found some typos in the original paper, so here you will find what I
think is the correct version. At the bottom of this post you will find
the calculations. In the following section we will assume that we are
given access to two unitaries $U : \ket{i}\ket{0} \to \ket{i}\ket{v}$
and $V : \ket{i}\ket{0} \to \ket{i}\ket{|v_i|} $.\
Let’s recall the relation between inner product and distance of
$\vec{u}, \vec{v} \in \mathbb{R}^n$. The inner product between two
vector is $\braket{ v, u } = \sum_{i} v_i u_i $, and the norm of a
vector is $ |v|= \sqrt{\langle v, v \rangle} $. Therefore, the distance
can be rewritten as:</p>
<table>
<tbody>
<tr>
<td>$</td>
<td>u-v</td>
<td>= \sqrt{ \langle u-v, u-v \rangle } = \sqrt{\sum_{i} (u_i-v_i)^2 } = \sqrt{</td>
<td>u</td>
<td>^2 +</td>
<td>v</td>
<td>^2 -2 \langle u, v \rangle } $</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td>By setting $ Z =</td>
<td>u</td>
<td>^2 +</td>
<td>v</td>
<td>^2 $ it follows that:</td>
</tr>
<tr>
<td>$</td>
<td>u-v</td>
<td>^2 = Z ( 1 - \frac{ 2 \langle u, v \rangle } {Z} ) $.</td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
<p>As you may have guessed, to find the distance $|v-u|$ we will repeat the
necessary number of times the swap circuit. The problem now is to find
the right states.\
We first start by creating
$|\psi \rangle = \frac{1}{\sqrt{2}} \Big( \ket{0}\ket{u} + \ket{1}\ket{v} \Big)$
querying QRAM in $O(log(N))$ time, where N is the dimension of the
Hilbert space (the length of the vector of the data).\
Then we proceed by creating
$|\phi\rangle \frac{1}{\sqrt{Z}} \Big( |\vec{v}||0\rangle + |\vec{u}||1\rangle \Big) $
and and estimate $Z=|\vec{u}|^2 + |\vec{v_j}|^2$. Remember that for two
vectors, $Z$ is easy to calculate, while in the case of a distance
between a vector and the center of a cluster then
$Z=|\vec{u}|+\sum_{i \in V} |\vec{v_i}|^2$. In this case, calculating
$Z$ scales linearly with the number of elements in the cluster, and we
don’t want that.</p>
<p>To create $\ket{\phi}$ and estimate $Z$, we have to start with another,
simpler-to-build $\ket{\phi^-}$ and make it evolve to $\ket{\phi}$. To
do so, we apply the following time dependent Hamiltonian for a certain
amount of time $t$ such that $t|\vec{v}|, t|\vec{u}| « 1 $
<script type="math/tex">H = |\vec{u}|\ket{0}\bra{0}+|\vec{v}|\ket{1}\bra{1} \otimes \sigma_x</script>
<script type="math/tex">\ket{\phi^-} = \ket{-}\ket{0}</script></p>
<p>The evolution $e^{-iHt} \ket{\phi^-}$ for small $t$ will give us the
following state:
<script type="math/tex">\Big( \frac{cos(|\vec{u}|t)}{\sqrt{2}}\ket{0} - \frac{cos(|\vec{v}|t)}{\sqrt{2}}\ket{1} \Big) \ket{0} - \Big( \frac{i sin(|\vec{u}|t)}{\sqrt{2}}\ket{0} - \frac{i sin(|\vec{v}|t)}{\sqrt{2}}\ket{1} \Big) \ket{1}</script></p>
<p>Reading the ancilla qubit in the second register, we should read $1$
with the following probability, given by small angle approximation of
the $sin$ function:</p>
<script type="math/tex; mode=display">P(1) = \lvert - \frac{i sin(|\vec{u}|t)}{\sqrt{2}} \rvert^2 + \lvert \frac{i sin(|\vec{v}|t)}{\sqrt{2}} \lvert^2 \approx |\frac{|\vec{u}|t}{\sqrt{2}}|^2 + | \frac{|\vec{v}|t}{\sqrt{2}} |^2 = \frac{1}{2} \Big( |\vec{u}|^2t^2 + |\vec{v}|^2t^2 \Big) = Z^2t^2/2</script>
<p>Now we are almost ready to use the swap circuit. Note that our two
quantum register have a different dimension, so we cannot swap them.
What we can do instead is to swap the index register of $\ket{\phi}$
with the whole state $\ket{\psi}$. The probability of reading $1$ is:</p>
<script type="math/tex; mode=display">\begin{split}
p(1) = \frac{2|\vec{u}|^2 + 2\vec{v}^2 - 4(u,v)}{8Z}
\end{split}</script>
<h1 id="conclusion">Conclusion</h1>
<p>We saw how to use a simple circuit to estimate things like inner product
and distance between two quantum vectors. We have assumed that we have
an efficient way of creating the states we are using, and we didn’t went
deep into explaining how. Given a $\epsilon > 0$, you can repeat the
previous circuit $O(\epsilon^{-2})$ times to have the desired precision.
Note the following thing: while calculating the value of $Z$ for two
vectors is easy, estimating it for calculating the distance between a
vector and the center of a cluster takes time linear in the number of
element in the superposition. Note that we can use amplitude estimation
in order to reduce the dependency on error to $O(\epsilon^{-1})$.</p>
<p>For the records:</p>
<p><span>Chernoff Bounds</span> Let $X = \sum_{i=0}^n X_i$, where $X_i = 1$
with probability $p_i$ and $X_i=0$ with probability $1-p_i$. All $X_i$
are independent. Let $\mu = E[X] = \sum_{i=0}^n p_i$. Then:</p>
<ul>
<li>
<p>$P(X \geq (1+\delta)\mu) \leq e^{-\frac{\delta^2}{2+\delta}\mu} $
for all $\delta > 0$</p>
</li>
<li>
<p>$P(X \leq (1-\delta)\mu) \leq e^{\frac{\delta^2}{2}}$</p>
</li>
</ul>
<p><span>Chebyshev</span> Let $X$ a random variable with $E[X] = \mu$ and
$Var[X]=\sigma_2$. For all $t > 0$:</p>
<script type="math/tex; mode=display">P(|X - \mu| > t\sigma) \leq 1/t^2</script>
<p>If we substitute $k/\sigma$ on $t$, we get the equivalent version that
we use to bound the error:
<script type="math/tex">P(|X - \mu|) \geq k) \leq \frac{\sigma^2}{k^2}</script></p>
<h2 id="calculations">Calculations</h2>
<p>It’s time now to prove that our claim is true and to show some
calculation. After all the previous passages, this is the initial state:</p>
<script type="math/tex; mode=display">\ket{0}\Big( \frac{1}{\sqrt{Z}} \left( |\vec{u}|\ket{0} + |\vec{v}|\ket{1} \right) \otimes \frac{1}{\sqrt{2}} (\ket{0}\ket{u} + \ket{1}\ket{v} ) \Big)</script>
<p>We apply an Hadamard on the leftmost ancilla register:</p>
<script type="math/tex; mode=display">\frac{1}{2\sqrt{Z}} \left[ \ket{0} \Big( \left( |\vec{u}|\ket{0} + |\vec{v}|\ket{1} \right) \otimes (\ket{0}\ket{u} + \ket{1}\ket{v} ) \Big) +
\ket{1} \Big( \left( |\vec{u}\ket{0} + \vec{v}\ket{1} \right) \otimes (\ket{0}\ket{u} + \ket{1}\ket{v} ) \Big) \right] =</script>
<script type="math/tex; mode=display">\begin{split}
= \frac{1}{2\sqrt{Z}} \Big[ \ket{0} \Big( |u|\ket{00u} + |u|\ket{01v} + |v|\ket{10u} + |v|\ket{11v} \Big) \\
\ket{1} \Big( |u|\ket{00u} + |u|\ket{01v} + |v|\ket{10u} + |v| \ket{11v}
\Big) \Big]
\end{split}</script>
<p>Controlled on the ancilla being $1$, we swap the second and the third
register:</p>
<script type="math/tex; mode=display">\begin{split}
= \frac{1}{2\sqrt{Z}} \Big[ \ket{0} \Big( |u|\ket{00u} + |u|\ket{01v} + |v|\ket{10u} + |v|\ket{11v} \Big) \\
\ket{1} \Big( |u|\ket{00u} + |u|\ket{10v} + |v|\ket{01u} + |v| \ket{11v} \Big) \Big]
\end{split}</script>
<p>Now we apply the Hadamard on the ancilla qubit again:</p>
<script type="math/tex; mode=display">\begin{split}
= \frac{1}{2^{3/2}\sqrt{Z}} \Big[
|u|\ket{000u} + |u|\ket{001v} + |v|\ket{010u} + |v|\ket{011v} \\
+|u|\ket{100u} + |u|\ket{101v} + |v|\ket{110u} + |v|\ket{111v} \\
+|u|\ket{000u} + |u|\ket{010v} + |v|\ket{001u} + |v|\ket{011v} \\
-|u|\ket{100u} - |u|\ket{110v} - |v|\ket{101u} - |v|\ket{111v}
\Big]
\end{split}</script>
<p>And now we check the probability of reading $\ket{1}$.</p>
<script type="math/tex; mode=display">\begin{split}
p(1) = \frac{1}{2^{3}Z} (u\bra{v10} + v\bra{u01} - u\bra{v01} - v\bra{u10})\\
(u\ket{01v} + v\ket{10u} - u\ket{10v} - v\ket{01u})
\end{split}</script>
<script type="math/tex; mode=display">\begin{split}
p(1) = \frac{2u^2 + 2v^2 - 4(u,v)}{8Z}
\end{split}</script>
<p>Thanks to IK and AG who checked :)</p>
<div id="refs" class="references">
<div id="ref-Lloyd2013QuantumLearning">
Lloyd, Seth, Masoud Mohseni, and Patrick Rebentrost. 2013. “Quantum
algorithms for supervised and unsupervised machine learning.” *ArXiv*
1307.0411 (July): 1–11. <http://arxiv.org/abs/1307.0411>.
</div>
</div>scinawaIntro to swap testSpace Estimation Of Hhl2017-12-27T00:00:00+01:002017-12-27T00:00:00+01:00https://luongo.pro/2017/12/27/Space-estimation-of-HHL<p>Let’s imagine that we are given a quantum computer with 100 logical
qubits, and let’s also assume that we have high gate fidelity (i.e.
applying a gate won’t introduce major errors in our computation). This
means that we can run all the algorithm that we want. An idyllic
situation like this won’t probably happen in the near future (let’s say
5 years). Even if now we have the first prototypes of quantum computers
with the first dozen of quibts, those qubits are not stable, and
therefore the computation we can do is pretty limited: in fact, these
prototypes aren’t able to perform error-free computations (there’s no
error correction yet), and that the computation won’t be “long” as we
want: we will be able to apply a limited number of gate before the
system decohere.</p>
<p>The question is the following: can we compete in solving linear system
of equations with a classical computer? Can we use it to run HHL
algorithm? Let’s recall that for HHL we need 1 qubit register for the
ancilla, a register for the output of phase estimation in the
Hamiltonian simulation (that will store the superposition of the
eigenvalues) and the rest of the qubit can store the input register. We
assume to have logical qubits in our comparison.</p>
<p>We’ll see what happen when we change precision for floating point
operations: 32, 64bit. The sparsity of the matrix is supposed to be
small. Since we want to be as close as possible as real cases, let’s
take a famous example of matrix considered sparse. The product-user
matrix that websites like Amazon, or Netflix use to run Recommendation
Algorithms: rows represent the users of the service, while columns are
the products. The rows are empty except where an user purchased a
specific product or watched a particular movie. Let’s say that an
educated guess for the sparsity of the matrix is $100$.</p>
<p><img src="/assets/HHL_resource_estimation/space_resource_estimation.png" alt="image" /></p>
<p>The upper horizontal line is an estimate of the space in TB of the
hard-disks for the whole Google (Cirrusinsight, n.d.) (13 EB), while the
lower one is an estimate for the storage need to store the images of
Google Maps (Mesarina, n.d.) (43 PB).</p>
<p>Let’s do an example to show what the software is plotting for $100$
qubits. In HHL we need an ancilla qubit. So we have 99 qubits. To get
$64$ bit precision, we need to allocate 64 qubits: this is the phase
estimation of the Hamiltonian simulation step. So now we are left with
just $35$ qubits. With $35$ qubits we can span a Hilbert space of
dimension $2^{35}$: this allow us to encode a vector of data with the
same number of components. Suppose our vector of known values has $64$
bits floating points: classically, the cost for storing this amount of
data we need $2^{35}*64$ bits, which are $0.549.5$TB (Terabytes).
Summing up the cost for storing a $2^{35} \times 2^{35}$ matrix with
sparsity $100$, we get $27$ TB.</p>
<p>Remember that each component of the vector will be encoded as
probability amplitude in our quantum register. This imply that our
precision in modifying a qubit need to grow, along with the number fo
qubit and the fidelity of the gates. Here we just focused on the
computational capabilities of a small quantum computer with respect to
HHL algorithm. Don’t forget that for HHL we will need to store the
matrix to invert as in the classical case (in form of QRAM or other
oracle). [Here](<a href="https://github.com/Scinawa/space_estimation_hhl">https://github.com/Scinawa/space_estimation_hhl</a>) the
code for generating the plot.</p>
<div id="refs" class="references">
<div id="ref-exagoogle">
Cirrusinsight. n.d. “How Much Data Does Google Store?”
</div>
<div id="ref-gmaps">
Mesarina, Malena. n.d. “How Much Storage Space Do You Need for Google
Maps?”
</div>
</div>scinawaLet’s imagine that we are given a quantum computer with 100 logical qubits, and let’s also assume that we have high gate fidelity (i.e. applying a gate won’t introduce major errors in our computation). This means that we can run all the algorithm that we want. An idyllic situation like this won’t probably happen in the near future (let’s say 5 years). Even if now we have the first prototypes of quantum computers with the first dozen of quibts, those qubits are not stable, and therefore the computation we can do is pretty limited: in fact, these prototypes aren’t able to perform error-free computations (there’s no error correction yet), and that the computation won’t be “long” as we want: we will be able to apply a limited number of gate before the system decohere.