arun.chagantys.org

Arun Chaganty’s Personal Website http://arun.chagantys.org

Murmurs

<div class=“poem”>

<p>It’s a Friday evening, when you hear, the careless whispers<br />     Looking around, bewildered, you see but a cat’s whiskers<br />Turning ’round the bend, to find a voice you swear you’d heard,<br />   The street is empty, save for a fence and its lonely lonesome bird<br />Walking solemn, walking slight, a crackling cackle gives you a fright<br />   Yet, behind you there is naught a sight?<br />Turn around, whence you came,<br />    sprint a sprint, for walking’s lame,<br />And as you stop to catch your breath,<br />    there’s only silence upon your wreath.<br />Shut the door and door the shutters,<br />     no more will you be troubled by its stuttering stutters.<br />And when the night is nigh,<br />      beneath your covers, sigh a sigh.<br />In your dreams you’ll see the truth,<br />    and make believe what was, was for sooth!<br />What made the murmurs murmur,<br />    was just the zoo’s giant le(r)mur<br />And the mumbling mumbles,<br />     oh, a baby with the grumbles<br />Only one thing could’ve made those cantankerous peals,<br />     the laughter, it must’ve come from the circus seals!<br />Now that that’s been put to rest,<br />     it’s time this mattress was put to test!</p> </div>

Thu, 05 Feb 2015 00:00:00 -0500 http://arun.chagantys.org/literature/2015/02/05/murmurs.html http://arun.chagantys.org/literature/2015/02/05/murmurs.html

And so it begins…

<div class=“left”> <img src=“/images/blog/2014-12-02-stuff.jpg” alt=“Stuff.” /> </div>

<div class=“right poem”>

<p>With neither a whimper nor a bang,<br />       but the steady accumulation of “stuff”<br />Encroaching on all things. With a pang,<br />  surrounded on four sides, we cannot rebuff<br />and must accept at long last:<br />A marriage is happening and is already almost past.</p> </div>

<div style=“clear:both;” />

<p><em>For the joyous occasion of my brother’s marriage.</em></p> Tue, 02 Dec 2014 00:00:00 -0500 http://arun.chagantys.org/literature/2014/12/02/it-begins.html http://arun.chagantys.org/literature/2014/12/02/it-begins.html

Fall.

<div class=“left”> <img src=“/images/blog/2014-11-12-fall.jpg” alt=“Leaves from trees” /> </div>

<div class=“right vcenter” style=“height:200px;”>

<p>Hark ye the fall! And minutes drop from high<br />  They twirl and twang as they gently settle<br />and quietly amongst their friends rustle<br />  and remember till winds blow them aside.</p> </div>

Wed, 12 Nov 2014 00:00:00 -0500 http://arun.chagantys.org/literature/2014/11/12/fall.html http://arun.chagantys.org/literature/2014/11/12/fall.html

Bright Shoes

On Atomic Norms <p>How many <em>linear measurements</em> do you need to (efficiently) recover a low rank matrix? What about a sparse vector or an orthogonal matrix? Given that we know our object of interest has some ‘structure’, can we answer this question in a general manner?</p> <p>In this article, I will show you one approach to do so; regression using atomic norms. Most of the material I will cover was presented in the paper, <a href=“http://arxiv.org/abs/1012.0621”>“The Convex Geometry of Linear Inverse Problems” by Venkat Chandrasekaran, et. al</a>.</p> <p>I will begin by describing the problem of structured recovery with a few motivating examples. I will then review some facts about convex geometry and what atomic norms are in this context. Next, I will show how these properties translate to sharp recovery guarantees, instantiating the framework with two examples; the recovery of low-rank matrices and orthogonal matrices.</p> <p>There are several more examples described in the paper that I will not talk about. Notably, I will <em>not</em> cover the details about theta bodies, approximations that the authors present, which trade off sample complexity for computational efficiency.</p> <p><strong>Note.</strong> Unfortunately, I have not been able to manipulate Pandoc and MathJax to do equation and theorem referencing. A PDF version of this article can be found <a href=“http://arun.chagantys.org/files/blog/atomic-norms.pdf”>here</a>.</p> <p><strong>Edit (27th May, 2013)</strong>: I had used <span class=“math”>()</span> in several places where I had really meant <span class=“math”>()</span>.</p> <h2 id=“motivation-recovering-under-determined-objects-using-structure”>Motivation: Recovering under-determined objects using structure</h2> <p>The problem of recovering sparse but high-dimensional vectors from data (“compressed sensing”) has seen recent success in applications to <a href=“http://nuit-blanche.blogspot.com/2009/09/cs.html”>genotyping</a>, <a href=“http://www.theengineer.co.uk/news/news-analysis/picture-of-health/1001485.article”>medical imaging</a> and comes up commonly when dealing with sensor data. Similarly, the recovery of low-rank matrices and tensors has been used to <a href=“http://www.ncbi.nlm.nih.gov/pubmed/15734364”>analyze fMRI data</a>. It also comes up in the recovery of signal from only the magnitude measurements (i.e. without phase), using the <a href=“http://arxiv.org/abs/1109.4499”>PhaseLift algorithm</a>.</p> <h3 id=“the-mixture-of-linear-regressions”>The Mixture of Linear Regressions</h3> <p><img src=“http://arun.chagantys.org/images/blog/atomic-norms/mlr-cartoon.png” /></p> <p>Personally, I came to be interested in this formulation from my work on applying the method of moments to the mixture of linear regressions, with Percy Liang. In that algorithm, we provided a consistent estimator for the discriminative model where we observe data <span class=“math”>((y_i, x_i))</span>, where <span class=“math”>(y_i = h^T xi + i)</span> and <span class=“math”>(h)</span> is one of <span class=“math”>(1, 2, k)</span>; we do not observe which and do not know what the <span class=“math”>(h)</span> are. Of course, <span class=“math”>(x_i)</span> could actually be features so the data could describe curves instead of lines, as depicted in the picture above.</p> <p>The problem is to learn the <span class=“math”>({h}{h=1}^k)</span>, and we do so by observing that <span class=“math”>(y_i^2)</span> are linear measurements of <span class=“math”>(h[_h^{2}])</span>, etc. After recovering these moments, we apply <a href=“http://arxiv.org/abs/1203.0683”>standard techniques</a> to recover <span class=“math”>({h})</span>. In these regression problems, we exploited the low-rank structure to recover the moments with just <span class=“math”>(O(kd))</span> samples instead of potentially <span class=“math”>(O(d^3))</span> samples!</p> <p>The results of this paper allow us, conceptually at least, to extend this approach to efficiently recover the moments from some linear measurements.</p> <h2 id=“the-structured-recovery-problem”>The Structured Recovery Problem</h2> <p>I will now talk about the exact problem we are looking at. This needs some definitions.</p> <p>Suppose that the object (matrix, tensor, etc.) we wish to recover is the convex combination of a few “atoms” in the <strong>atomic set</strong> <span class=“math”>()</span>, <span class=“math”>[ ]</span></p> <p>We can define the <strong>atomic norm</strong> of this set to be, <span class=“math”>[

]</span> Intuitively, this function tells us how much we need to scale <span class=“math”>()</span> so that it’s convex hull contains <span class=“math”>(x)</span>. In principle, <span class=“math”>(||{})</span> is only a norm when <span class=“math”>()</span> is “centrally symmetric” (<span class=“math”>(a -a )</span>). The results in the paper extend to non-centrally symmetric <span class=“math”>()</span> as well, so we will just assume that <span class=“math”>(||{})</span> is a valid norm here.</p> <!– Also, we will assume that $$\mA$$ is convex, since we will only be interested in the properties of the convex hull of $$\mA$$, $$\conv(\mA)$$. –>

<p><strong>Examples.</strong> Two standard choices for <span class=“math”>()</span> are the set of 1-sparse vectors (for sparse recovery) and set of rank-one matrices (for low rank recovery). The corresponding norms are, as expected, the <span class=“math”>(L_1)</span>, <span class=“math”>(||{1})</span>, norm and the nuclear norm, <span class=“math”>(||{})</span>.</p> <p>Let <span class=“math”>()</span> be a linear measurement operator. We wish to ask, how many linear measurements <span class=“math”>(y = x^)</span> are required to exactly recover <span class=“math”>(x^*)</span> from the convex program, <span class=“math”>[ ]</span></p> <p>The approach also handles the case where the observations are noisy, <span class=“math”>(y = x^* + )</span>, where <span class=“math”>(||||_2 )</span>; <span class=“math”>[ ]</span> In this case, we will ask instead for the number samples required so that <span class=“math”>(||x - x^*|| )</span> for any <span class=“math”>()</span>.</p> <h3 id=“convex-geometry-prelimnaries”>Convex Geometry Prelimnaries</h3> <p>Before we proceed, I will need to review a few simple but crucial concepts from convex geometry.</p> <p><img src=“http://arun.chagantys.org/images/blog/atomic-norms/cones.png” /></p> <p>The <strong>tangent cone</strong>, <span class=“math”>(T_{}(x))</span>, gives the directions that <em>decrease</em> <span class=“math”>(||_{})</span>, <span class=“math”>[ ]</span> Similarly, we can define the <strong>normal cone</strong>, <span class=“math”>[ ]</span> Finally, the <strong>polar</strong> of a cone <span class=“math”>(C^*)</span> is the set of vectors that are obtuse with respect to all vectors in <span class=“math”>(C)</span>. <span class=“math”>[ ]</span></p> <p>Note that the polar cone of the tangent cone is the normal cone, and vice versa.</p> <p><strong>Examples.</strong> In the <span class=“math”>(L_2)</span> case, i.e. the elements <span class=“math”>()</span> have no structural constraints, the tangent cone is the half space, and the normal cone is the normal of this halfspace. In the <span class=“math”>(L_1)</span> case, the tangent cone is the cone extending from edges of the <span class=“math”>(l_1)</span> ball; the normal cone is the cone bounded by the normals of these two (shown in figure).</p> <h2 id=“recovering-x”>Recovering <span class=“math”>(x^)</span></h2> <p>We have finally come to the meat of this article; in this section, we will study how the properties of the atomic set, <span class=“math”>()</span>, relate to difficulty of recovering <span class=“math”>(x^)</span>.</p> <p>The first result should illuminate why this approach will work.</p> <div class=“lemma”> (1. Exact Recovery) <span class=“math”>(x^*)</span> is a unique solution to the exact optimization problem iff the tangent cone of the atomic norm at the minimizer and the null-space of the observation operator are transverse, <span class=“math”>[

]</span> </div>

<p>Recall that the tangent cone <span class=“math”>(T_{}(x^))</span> gives the descent directions of the objective function (<span class=“math”>(||{})</span>) and if <span class=“math”>(())</span>, then <span class=“math”>(= 0)</span>. Thus, we could move <span class=“math”>()</span> along these directions and reduce the objective <em>without violating the constraint <span class=“math”>(y = x)</span></em>.</p> <p>Now, when we have noisy measurements, we need a stronger condition; that the measurement operator have a “noticeable” projection on the descent directions in <span class=“math”>(T{}(x^))</span>.</p> <div class=“lemma”> (2. Noisy Recovery) Let <span class=“math”>(y = x^* + , || )</span> be <span class=“math”>(n)</span> noisy measurements and <span class=“math”>(x)</span> be the optimal solution to the noisy optimization problem. If, for all <span class=“math”>(z T_{}(x^*))</span>, the measurements are bounded below, <span class=“math”>[

]</span> then <span class=“math”>(| x - x^* | )</span>. </div> <div class=“proof”>

<p>As <span class=“math”>(x)</span> minimizes <span class=“math”>(||{})</span> in the program, we have that <span class=“math”>(|x|{} |x^|{})</span> and thus that <span class=“math”>(x - x^* T{}(x^))</span>. Using the linearity of <span class=“math”>()</span> and the triangle inequality, we derive that, <span class=“math”>[

]</span> where the first term is upper bounded by <span class=“math”>()</span> by the optimization program, and the second term is upper bounded by <span class=“math”>()</span> from assumptions on <span class=“math”>()</span>.</p> Finally, using the assumption on <span class=“math”>()</span>, <span class=“math”>(|(x - x^*)|{} |x - x^*|{})</span>, giving us the finaly result. </div>

<p><span class=“math”>()</span> is a random quantity, so we will have to show that with sufficient samples, the transversity conditions hold with high probability. Note that the assumptions for Lemma 2 subsume those for Lemma 1; it is sufficient for us to estimate a lower bound on <span class=“math”>[

]</span> This quantity will be referred to as the <strong>minimum gain</strong> of the measurement operator restricted to the cone <span class=“math”>(T_{A}(x^*))</span>.</p> <p><strong>Aside: Atomic norms are the “best” convex heuristic.</strong> One intuition to use the atomic norm is as follows. A basic condition we’d like for heuristic penalty is that it be constant for each atom in the set <span class=“math”>()</span>. Consequently, <span class=“math”>(a - a&#39;)</span> should be a descent direction for any <span class=“math”>(a, a&#39; )</span>. Requiring that the penalty be convex implies that the set of descent directions <span class=“math”>({a - a&#39; a, a&#39; })</span> should be a cone. This is precisely the tangent cone at <span class=“math”>(a )</span> with respect to <span class=“math”>(())</span>.</p> <h3 id=“gaussian-widths-and-bounding-minimum-gain”>Gaussian widths and bounding minimum gain</h3> As we noted above, the number of samples required for recovery are determined by minimum gain of <span class=“math”>()</span>. We will now characterize this quantity in terms of the <em>Gaussian width</em> of the tangent cone, defined below, <div class=“definition”> (Gaussian Width) The Gaussian width of a set <span class=“math”>(S ^p)</span> is, <span class=“math”>(w(S) = {g (0, In)} [_{zS} g^T z ])</span>. </div>

<p>Of course, because of the linearity of <span class=“math”>()</span>, the minimum gain is independent of the length of <span class=“math”>(z)</span> so, bounding <span class=“math”>(||z||{})</span> on the unit sphere, <span class=“math”>(= T{}(x^*) ^{p-1})</span>, is sufficient to bound <span class=“math”>(||z||{})</span> on <span class=“math”>(T{})</span>.</p> <a href=“http://link.springer.com/chapter/10.1007%2FBFb0081737?LI=true”>Yehoram Gordon</a> (<a href=“http://www.math.uiuc.edu/~mjunge/Comps/Gordonm.pdf”>pdf</a>) proved the following key result on when a random subspace escapes a “mesh” in <span class=“math”>(^n)</span>, <div class=“theorem”> (1. Gaussian Width) Let <span class=“math”>(^{p-1})</span> and <span class=“math”>(: ^p ^n)</span> be a <em>random map</em> with i.i.d. standard Gaussian entries. Then, <span class=“math”>[

]</span> where <span class=“math”>(n)</span> is the expected length of a <span class=“math”>(n)</span>-dimensional Gaussian vector, <span class=“math”>({g (0, I_n)}[ ||g||_2 ])</span>. </div>

<p>The proof follows from a generalization of the Sudakov-Fernique inequality. It is also useful to note that <span class=“math”>(k = ()/ ())</span>, which is tightly bounded, <span class=“math”>( k )</span>.</p> <h3 id=“main-theorem-exponential-convergence-to-x”>Main Theorem: Exponential Convergence to <span class=“math”>(x^)</span></h3> <p>We are finally ready to state and prove the main theorem which shows that we need about <span class=“math”>(w()^2)</span> samples to exponentially convergence to <span class=“math”>(x^)</span>. Theorem 1 gives us on the minimum gain that holds in expectation. To extend it to the finite sample regime, we’ll exploit the property that <span class=“math”>({z } ||z||2)</span> is Lipschitz (with constant <span class=“math”>(1)</span>) and use the concentration of Gaussian measures to show convergence.</p> <div class=“theorem”> <p>(2. Recovery) Let <span class=“math”>()</span> be a random map with rows drawn i.i.d. from <span class=“math”>((0, I_n))</span> and <span class=“math”>(= T_{}(x^) ^{p-1})</span>. Then</p> <ol type=“1”> <li><p>Given measurements <span class=“math”>(y = x^)</span>, then <span class=“math”>(x^)</span> is the unique solution of the the exact convex program with probability at least <span class=“math”>(1 - ( -(n - w())^2 ))</span>, if <span class=“math”>(n w()^2 + 1)</span>.</p></li> <li>Given measurements <span class=“math”>(y = x^* + , ||2 )</span>, then the solution <span class=“math”>(x)</span> of the the noisy convex program satisfies <span class=“math”>(| x^ - x |2 )</span> with probability at least <span class=“math”>(1 - ( -(n - w() - )^2 ))</span>, if <span class=“math”>(n )</span>. </div> </li> </ol> Note that in both cases, the convergence rates are exponential <span class=“math”>(O((-n)))</span>. <div class=“proof”>

<p>This follows directly by using the property that for any Lipschitz function, with constant <span class=“math”>(L)</span>, <span class=“math”>[ ]</span></p> <p>To see that the map <span class=“math”>({z } |z|2)</span> is Lipschitz, observe, <span class=“math”>[ ]</span></p> Thus, we have, <span class=“math”>[

]</span> when <span class=“math”>(_n w() - )</span>. Plugging in <span class=“math”>(= 0)</span>, we get the first condition as well. </div>

<p><strong>Relationship to Radermacher complexity.</strong> In the empirical risk minimization framework, the sample complexity is almost entirely defined by the <a href=“http://en.wikipedia.org/wiki/Rademacher_complexity”>Radermacher complexity</a>, which is analogous to the Gaussian width, but with expectations taken over binary random variables instead of a Guassian.</p> <h3 id=“properties-of-gaussian-widths”>Properties of Gaussian Widths</h3> <p>The main theorem shows that the sample complexity is pretty sharply bounded by the Gaussian width <span class=“math”>(w())</span>. The Gaussian width of a set is a very well behaved object with many useful properties. We will use these properties in the applications section, but you should feel free to skip the details here for now.</p> <h4 id=“relationship-to-intrinsic-volume”>Relationship to Intrinsic Volume</h4> <p> A large number of properties follow from the fact that <span class=“math”>(w)</span> is a <a href=“http://en.wikipedia.org/wiki/Valuation_(measure_theory)”>valuation</a>, an analogue of a measure. To show this, we’ll use a clever fact that an isotropic Gaussian vector can be separated into two independent parts, length and direction. The direction component integrates over the unit sphere. <span class=“math”>[ ]</span> where <span class=“math”>(b(S))</span> is the <strong>intrinsic volume</strong> of a set (a deterministic property). It turns out (and this is intuitive) that <span class=“math”>(, u)</span> is a Haar measure, giving us the following properties for free.</p> <ul> <li><span class=“math”>(w(S))</span> invariant to translations and unitary transformations.</li> <li>Scaling: If <span class=“math”>(w(tK) t w(K), t &gt; 0)</span>.</li> <li>Monotonic: If <span class=“math”>(S_1 S_2 ^p)</span>, <span class=“math”>(w(S_1) w(S_2))</span>.</li> <li>If <span class=“math”>(w( S_1 S_2 ) + w( S_1 S_2 ) = w(S_1) + w(S_2))</span>.</li> <li><span class=“math”>(w( S ) = w((S)))</span> (because of the <span class=“math”>()</span>).</li> <li>If <span class=“math”>(S = S_1 S_2)</span>, <span class=“math”>(w( S ^{p-1} ) w( S_1 ^{p-1} ) + w( S_2 ^{p-1} ))</span>.</li> </ul> <h4 id=“other-properties”>Other Properties</h4> <p></p> <p>There are a couple of other cool properties that I won’t prove here</p> <ul> <li>If <span class=“math”>(V)</span> is a vector space, <span class=“math”>(w( V ^{p-1} ) = )</span>.</li> <li><span class=“math”>(w(C ^{p-1}) {g (0, Ip)}[ (g,C^*) ])</span> if <span class=“math”>(C)</span> is a non-empty convex cone in <span class=“math”>(^p)</span>. The proof follows from convex duality between <span class=“math”>(_{z C} g^T z)</span> and <span class=“math”>((g,C^))</span>.</li> <li>A simple corollary; <span class=“math”>(w(C ^{p-1}) + w(C^ ^{p-1}) p)</span>.</li> <li>Let <span class=“math”>((S,))</span> be the <span class=“math”>()</span>-covering number of <span class=“math”>(S)</span>. Then, by Dudley’s inequality, <span class=“math”>[

]</span></li> </ul> <h2 id=“applications”>Applications</h2> <p>We have covered some fairly complex results in this article. To summarize, we showed the convex optimization problem exponentially converges after getting <span class=“math”>(O(w()^2))</span> samples, where <span class=“math”>(w())</span> is the Gaussian width of the tangent cone of the atomic norm <span class=“math”>(T_{}(x^))</span> at the unit sphere. The Gaussian width has several useful compositional properties (just like the Radermacher complexity). To instantiate the framework, we just need to bound the Gaussian width of the atomic set. In this section, we will do so for two examples, orthogonal matrices and low rank matrices.</p> <h3 id=“orthogonal-matrices”>Orthogonal Matrices</h3> <div class=“lemma”> (3. Orthogonal Recovery) Let <span class=“math”>(x^)</span> be an <span class=“math”>(m m)</span> orthogonal matrix. Then, letting <span class=“math”>()</span> be the set of all orthogonal matrices, <span class=“math”>(w(T_{}(x^*) {m2 -1})^2 )</span>. </div>

<p>Let’s start by defining the atomic norm which should be <span class=“math”>(1)</span> for all elements in <span class=“math”>()</span> and <span class=“math”>(1)</span> for everything in the convex hull <span class=“math”>()</span>. Note that the set of orthogonal matrices is <em>not convex</em>. However, it is not hard to see that every matrix in the convex hull of all <span class=“math”>(m m)</span> orthogonal matrices has singular values <span class=“math”>(1)</span>; let <span class=“math”>(U)</span> and <span class=“math”>(V)</span> be two orthogonal matrices; <span class=“math”>(V = A U)</span>, where <span class=“math”>(A)</span> is also orthogonal. So, <span class=“math”>[ ]</span> where <span class=“math”>(||U||2 = 1(U))</span> is the spectral norm, the largest singular value. We have just shown that the spectral norm is an atomic norm!</p> <p>Using the property that <span class=“math”>(w)</span> is rotationally invariant, we just need to consider the tangent cone at <span class=“math”>(x^* = I)</span>. <span class=“math”>[ ]</span></p> <p>Every matrix <span class=“math”>(M)</span> can be represented as the sum of a symmetric (<span class=“math”>(A=A^T)</span>) and skew-symmetric matrix (<span class=“math”>(A=-A^T)</span>), so we can partion <span class=“math”>(T_{}(I))</span> into these two spaces. The Lie algebra of skew-symmetric matrices is the tangent space of the orthogonal group <span class=“math”>((m))</span>, so we’ll have to consider full subspace. Using the symmetry property, this subset is isomorphic to the <span class=“math”>()</span> dimensional vector space that defines the entries above (or below) the diagonal.</p> <p>The remaining component lies in subspace of symmetric matrices, <span class=“math”>[

]</span> where <span class=“math”>(m)</span> is the set of positive semi-definite <span class=“math”>(m m)</span> matrices. The positive semidefinte cone is self-dual (i.e. the polar cone <span class=“math”>(m^* = m)</span>) and hence contributes <span class=“math”>( )</span>.</p> <p>Putting this together, we have that <span class=“math”>(w()^2 + )</span>.</p> <h3 id=“low-rank-matrices”>Low-Rank Matrices</h3> <div class=“lemma”> (4. Low-rank Recovery) Let <span class=“math”>(x^)</span> be a <span class=“math”>(m_1 m_2)</span> rank-r matrix (<span class=“math”>(m_1 m_2)</span>). Then, letting <span class=“math”>()</span> be the set of unit-norm rank-one matrices, <span class=“math”>(w(T_{}(x^) ^{m1 m_2 -1})^2 3 r (m_1 + m_2 - r))</span>. </div>

<p>Once again, the set of rank-1 matrices is not convex; the sum of two rank-1 matrices is rarely going to stay rank-1! The spectral norm, <span class=“math”>(||M||* = {i=1}^{m} i(M))</span> (i.e. the sum of singular values) is a suitable atomic norm for this set. Firstly, <span class=“math”>(||)</span> is 1 for every element in <span class=“math”>()</span> and is also <span class=“math”>(1)</span> for every element in <span class=“math”>()</span>. While there may be other choices for <span class=“math”>(||{})</span>, the spectral norm is a natural choice to minimize the size of the tangent cone, as we’ll shortly see.</p> <p>Let <span class=“math”>(x^* = UV^T)</span>, where <span class=“math”>(U ^{m1 r})</span>, <span class=“math”>(^{r r})</span> and <span class=“math”>(V ^{m_2 r})</span>. The tangent cone at <span class=“math”>(x^)</span> is, <span class=“math”>[ ]</span></p> <p>To bound the width of this set, we will proceed by bounding the distance to the normal cone. The normal cone is described by the sub-differentials at <span class=“math”>(x^*)</span>, <span class=“math”>[ ]</span> where <span class=“math”>(||W||^*{})</span> is the <em>dual norm</em>; in our case it’s just the operator norm <span class=“math”>(||2)</span>. Let <span class=“math”>()</span> be directions in the subspaces of <span class=“math”>(U)</span> and <span class=“math”>(V)</span> and <span class=“math”>(^{})</span> be directions orthogonal to <span class=“math”>(U)</span> and <span class=“math”>(V)</span>.</p> <p>For any matrix <span class=“math”>(G)</span>, a point in normal cone is the projection, <span class=“math”>(||P_{{}}(G)||2 UV^T + P{{}}(G))</span>. Finally, <span class=“math”>[ ]</span> The first term contributes <span class=“math”>(r(m_1 + m_2 - r))</span> because the dimension of <span class=“math”>(P_{})</span> is <span class=“math”>(r(m_1 + m_2 - r))</span>. The second term is calculated by observing that <span class=“math”>(P_{^{}}(G))</span> is an isotropic Gaussian <span class=“math”>((m_1 - r)(m_2 -r))</span> matrix. We have good concentration results on the singular values of such random matrices, <span class=“math”>[ ]</span> It’s not hard to show from here that <span class=“math”>([||P_{}(G)||_22] ( + )^2 + 2)</span>.</p> <p>Putting this all together, we get our desired result; <span class=“math”>[

]</span></p> <h2 id=“conclusions”>Conclusions</h2> <p>I hope through this article I’ve made a case for atomic norms. The framework allows us to reduce the complexity of a recovery problem for arbitrary atomic sets to a Gaussian width computation. Computing the Gaussian widths can still be a hard problem (just like computing the Radermacher complexity), but they provide several useful properties that make these calculations easier. From the few examples I have seen, the Gaussian width calculation requires you to fairly mechanically decompose the tangent cone into spaces that are fairly easy to describe.</p> Sat, 25 May 2013 00:00:00 -0400 http://arun.chagantys.org/technical/2013/05/25/on-atomic-norms.html http://arun.chagantys.org/technical/2013/05/25/on-atomic-norms.html

On Regression

<p>In this article, I will try to demonstrate why regression works at all (particularly with different regularizations) and how regularization should be controlled to have convergent answers. I will operate in the fixed design setting, i.e. when bounds are assumed on the input as opposed to assuming the random distribution from whence it came (the random design setting).</p> Mon, 13 May 2013 00:00:00 -0400 http://arun.chagantys.org/technical/2013/05/13/on-regression.html http://arun.chagantys.org/technical/2013/05/13/on-regression.html

A Day in Monterey

<h1 id=“ive-come-here-to-find-some-answers-but-find-questions-instead.”>I’ve come here to find some answers, but find questions instead.</h1> <p>I sit upon a bench, my feet, they itch to leave the world behind. Life these days; so much choice, yet all kitsch And today, when to answers I’ve devoted my mind, starring into the distance but asks me questions to twitch. Oh, what I’ll prove, my fingers eagerly twitch I’ll find the answers, apply my mind But not today, I’m lost in a X, thoughts are kitsch</p> <p>My jacket flutters in the cool breeze. In the calm serenity, meditation beckons me. The theorems I set out to prove will yet find their answers. Here, now, I lose myself to questions.</p> <h1 id=“theme-slice-into-these-questions-from-looking-at-the-world”>(Theme: slice into these questions from looking at the world)</h1> <h1 id=“birds”>Birds</h1> <h1 id=“squirrels”>Squirrels</h1> <h1 id=“waves-and-plants”>Waves and plants</h1> <p>A fleet of birds swoop down, kissing the waves as they fly by A white albatross emerges from the horizon long wings spread out, a single flap carries it across the shore whilst its caws roll in before receding.</p> <p>A squirrel hops around in circles, Its tail flits from side to side. Here it nibbles on grass, there it sniffs questioningly, searching for something, I haven’t the voice to ask “What?”</p> <p>The bright blue bed of ice plants reminds me of an alien world.</p> <p>All in the quiet backdrop of the sea Gentle waves breaking the silence in the gentlest of ways A whisper to keep you calm A whisper to that keeps you alarmed.</p> <p>I forget the theorems I had set out to prove But am reminded same. static equillibrium. All the down at long last. Dynamic equillibirum, and of this universal truth. To learn, to understand, there is always a question. Be it in how the waves work, or why the algorithms settle And always the quest for answers.</p> <ul> <li>Theme, questions – nature – questions.</li> </ul> Sat, 27 Apr 2013 00:00:00 -0400 http://arun.chagantys.org/literature/2013/04/27/a-day-in-monterey.html http://arun.chagantys.org/literature/2013/04/27/a-day-in-monterey.html

The Subgradient of a Matrix Norm

Memory and Forgetting

<p><em>(Dedicated to a Monsieur Le Coq)</em></p> <div class=“poem”>

<p>You stand here, solemn, silent, by my side,<br />  Whispering softly of a time gone by;<br />The waves of could’ves flow in with the tide,<br />  The wind murmuring should’ves with a sigh</p> <p>Somewhere in time, unknowing, unheeding, uncaring,<br />  it was I who jumped ship, making instead,<br />a choice of warmth temporary, more comforting then<br />  than this, the narrow river of reason you tread.</p> <p>Your lines were sharper, your principles ever<br />  more principled in a singular face.<br />Mistaken, was I, that they our thread would sever,<br />  while forgetting all those decisions misplaced.</p> <p>Dear Sir, upon this road I can’t return,<br />  Were that I had taken yours, old friend, and that<br />In your company I could partake, than this<br />  ghost’s. Sir, could it be that I you regret?</p> </div>

Mon, 05 Nov 2012 00:00:00 -0500 http://arun.chagantys.org/literature/2012/11/05/memory-forgetting.html http://arun.chagantys.org/literature/2012/11/05/memory-forgetting.html

Left Unsaid

<div class=“poem”>

<p>There is, bubbling inside me,<br />  Something I must tell you.<br />But you seem busy;<br />  I’ll wait, no worries,<br />It’s just a matter of timing, you see,<br />  And patient waiting is no stranger to me.</p> <p>What needed doing is finally done,<br />  The ships have set sail. In the twilight,<br />You seem tired, justifiably so.<br />  I’ll wait, no worries,<br />The bubbles can bubble over -<br />  Nothing I haven’t handled before</p> <p>Waves come of the morning tide,<br />  Wash ashore dead wood, left at sea;<br />In your eyes, distress I see,<br />  Tell me of the distant shores<br />As I push you adrift knowing,<br />  One day, you’ll come back for me</p> <p>The water has finally rocked itself asleep<br />  And the calm that follows, bliss.<br />For once I see in you peace, solitary;<br />  However fraught this moment may be,<br />A lake amidst a vast ocean.<br />  In my silence, this silence I’ll share.</p> <p>Time will take its own course<br />  One day this ship will run aground<br />O’er bubbled waters<br />  There was once a thing bubbling inside of me<br />In a time since gone by,<br />  It was a matter of timing, you’ll see.</p> </div>

Fri, 24 Aug 2012 00:00:00 -0400 http://arun.chagantys.org/literature/2012/08/24/left-unsaid.html http://arun.chagantys.org/literature/2012/08/24/left-unsaid.html