<p>It’s a Friday evening, when you hear, the careless whispers<br /> Looking around, bewildered, you see but a cat’s whiskers<br />Turning ’round the bend, to find a voice you swear you’d heard,<br /> The street is empty, save for a fence and its lonely lonesome bird<br />Walking solemn, walking slight, a crackling cackle gives you a fright<br /> Yet, behind you there is naught a sight?<br />Turn around, whence you came,<br /> sprint a sprint, for walking’s lame,<br />And as you stop to catch your breath,<br /> there’s only silence upon your wreath.<br />Shut the door and door the shutters,<br /> no more will you be troubled by its stuttering stutters.<br />And when the night is nigh,<br /> beneath your covers, sigh a sigh.<br />In your dreams you’ll see the truth,<br /> and make believe what was, was for sooth!<br />What made the murmurs murmur,<br /> was just the zoo’s giant le(r)mur<br />And the mumbling mumbles,<br /> oh, a baby with the grumbles<br />Only one thing could’ve made those cantankerous peals,<br /> the laughter, it must’ve come from the circus seals!<br />Now that that’s been put to rest,<br /> it’s time this mattress was put to test!</p> </div>

<div class=“right poem”>

<p>With neither a whimper nor a bang,<br /> but the steady accumulation of “stuff”<br />Encroaching on all things. With a pang,<br /> surrounded on four sides, we cannot rebuff<br />and must accept at long last:<br />A marriage is happening and is already almost past.</p> </div>

<div style=“clear:both;” />

<p><em>For the joyous occasion of my brother’s marriage.</em></p>

<div class=“right vcenter” style=“height:200px;”>

<p>Hark ye the fall! And minutes drop from high<br /> They twirl and twang as they gently settle<br />and quietly amongst their friends rustle<br /> and remember till winds blow them aside.</p> </div>

]</span> Intuitively, this function tells us how much we need to scale <span class=“math”>()</span> so that it’s convex hull contains <span class=“math”>(x)</span>. In principle, <span class=“math”>(||*{})</span> is only a norm when <span class=“math”>()</span> is “centrally symmetric” (<span class=“math”>(a -a )</span>). The results in the paper extend to non-centrally symmetric <span class=“math”>()</span> as well, so we will just assume that <span class=“math”>(||*{})</span> is a valid norm here.</p> <!– Also, we will assume that \(\mA\) is convex, since we will only be interested in the properties of the convex hull of \(\mA\), \(\conv(\mA)\). –>

]</span> </div>

<p>Recall that the tangent cone <span class=“math”>(T_{}(x^]</span> then <span class=“math”>(| x - x^* | )</span>. </div> <div class=“proof”>

<p>As <span class=“math”>(x)</span> minimizes <span class=“math”>(||]</span> where the first term is upper bounded by <span class=“math”>()</span> by the optimization program, and the second term is upper bounded by <span class=“math”>()</span> from assumptions on <span class=“math”>()</span>.</p> Finally, using the assumption on <span class=“math”>()</span>, <span class=“math”>(|(x - x^*)|*{} |x - x^*|*{})</span>, giving us the finaly result. </div>

]</span> This quantity will be referred to as the <strong>minimum gain</strong> of the measurement operator restricted to the cone <span class=“math”>(T_{A}(x^*))</span>.</p> <p><strong>Aside: Atomic norms are the “best” convex heuristic.</strong> One intuition to use the atomic norm is as follows. A basic condition we’d like for heuristic penalty is that it be constant for each atom in the set <span class=“math”>()</span>. Consequently, <span class=“math”>(a - a')</span> should be a descent direction for any <span class=“math”>(a, a' )</span>. Requiring that the penalty be convex implies that the set of descent directions <span class=“math”>({a - a' a, a' })</span> should be a cone. This is precisely the tangent cone at <span class=“math”>(a )</span> with respect to <span class=“math”>(())</span>.</p> <h3 id=“gaussian-widths-and-bounding-minimum-gain”>Gaussian widths and bounding minimum gain</h3> As we noted above, the number of samples required for recovery are determined by minimum gain of <span class=“math”>()</span>. We will now characterize this quantity in terms of the <em>Gaussian width</em> of the tangent cone, defined below, <div class=“definition”> (Gaussian Width) The Gaussian width of a set <span class=“math”>(S ^p)</span> is, <span class=“math”>(w(S) = *{g (0, I*n)} [_{zS} g^T z ])</span>. </div>

]</span> where <span class=“math”>(*n)</span> is the expected length of a <span class=“math”>(n)</span>-dimensional Gaussian vector, <span class=“math”>(*{g (0, I_n)}[ ||g||_2 ])</span>. </div>

<p>The proof follows from a generalization of the Sudakov-Fernique inequality. It is also useful to note that <span class=“math”>(*k = ()/ ())</span>, which is tightly bounded, <span class=“math”>( *k )</span>.</p> <h3 id=“main-theorem-exponential-convergence-to-x”>Main Theorem: Exponential Convergence to <span class=“math”>(x^*)</span></h3> <p>We are finally ready to state and prove the main theorem which shows that we need about <span class=“math”>(w()^2)</span> samples to exponentially convergence to <span class=“math”>(x^*)</span>. Theorem 1 gives us on the minimum gain that holds in expectation. To extend it to the finite sample regime, we’ll exploit the property that <span class=“math”>(*{z } ||z||*2)</span> is Lipschitz (with constant <span class=“math”>(1)</span>) and use the concentration of Gaussian measures to show convergence.</p> <div class=“theorem”> <p>(2. Recovery) Let <span class=“math”>()</span> be a random map with rows drawn i.i.d. from <span class=“math”>((0, I_n))</span> and <span class=“math”>(= T_{}(x^*) ^{p-1})</span>. Then</p> <ol type=“1”> <li><p>Given measurements <span class=“math”>(y = x^*)</span>, then <span class=“math”>(x^*)</span> is the unique solution of the the exact convex program with probability at least <span class=“math”>(1 - ( -( n - w())^2 ))</span>, if <span class=“math”>(n w()^2 + 1)</span>.</p></li> <li>Given measurements <span class=“math”>(y = x^* + , ||2 )</span>, then the solution <span class=“math”>(x)</span> of the the noisy convex program satisfies <span class=“math”>(| x^* - x |

]</span> when <span class=“math”>(_n w() - )</span>. Plugging in <span class=“math”>(= 0)</span>, we get the first condition as well. </div>

<p><strong>Relationship to Radermacher complexity.</strong> In the empirical risk minimization framework, the sample complexity is almost entirely defined by the <a href=“http://en.wikipedia.org/wiki/Rademacher_complexity”>Radermacher complexity</a>, which is analogous to the Gaussian width, but with expectations taken over binary random variables instead of a Guassian.</p> <h3 id=“properties-of-gaussian-widths”>Properties of Gaussian Widths</h3> <p>The main theorem shows that the sample complexity is pretty sharply bounded by the Gaussian width <span class=“math”>(w())</span>. The Gaussian width of a set is a very well behaved object with many useful properties. We will use these properties in the applications section, but you should feel free to skip the details here for now.</p> <h4 id=“relationship-to-intrinsic-volume”>Relationship to Intrinsic Volume</h4> <p> A large number of properties follow from the fact that <span class=“math”>(w)</span> is a <a href=“http://en.wikipedia.org/wiki/Valuation_(measure_theory)”>valuation</a>, an analogue of a measure. To show this, we’ll use a clever fact that an isotropic Gaussian vector can be separated into two independent parts, length and direction. The direction component integrates over the unit sphere. <span class=“math”>[ ]</span> where <span class=“math”>(b(S))</span> is the <strong>intrinsic volume</strong> of a set (a deterministic property). It turns out (and this is intuitive) that <span class=“math”>(, u)</span> is a Haar measure, giving us the following properties for free.</p> <ul> <li><span class=“math”>(w(S))</span> invariant to translations and unitary transformations.</li> <li>Scaling: If <span class=“math”>(w(tK) t w(K), t > 0)</span>.</li> <li>Monotonic: If <span class=“math”>(S_1 S_2 ^p)</span>, <span class=“math”>(w(S_1) w(S_2))</span>.</li> <li>If <span class=“math”>(w( S_1 S_2 ) + w( S_1 S_2 ) = w(S_1) + w(S_2))</span>.</li> <li><span class=“math”>(w( S ) = w((S)))</span> (because of the <span class=“math”>()</span>).</li> <li>If <span class=“math”>(S = S_1 S_2)</span>, <span class=“math”>(w( S ^{p-1} ) w( S_1 ^{p-1} ) + w( S_2 ^{p-1} ))</span>.</li> </ul> <h4 id=“other-properties”>Other Properties</h4> <p></p> <p>There are a couple of other cool properties that I won’t prove here</p> <ul> <li>If <span class=“math”>(V)</span> is a vector space, <span class=“math”>(w( V ^{p-1} ) = )</span>.</li> <li><span class=“math”>(w(C ^{p-1})]</span></li> </ul> <h2 id=“applications”>Applications</h2> <p>We have covered some fairly complex results in this article. To summarize, we showed the convex optimization problem exponentially converges after getting <span class=“math”>(O(w()^2))</span> samples, where <span class=“math”>(w())</span> is the Gaussian width of the tangent cone of the atomic norm <span class=“math”>(T_{}(x^*))</span> at the unit sphere. The Gaussian width has several useful compositional properties (just like the Radermacher complexity). To instantiate the framework, we just need to bound the Gaussian width of the atomic set. In this section, we will do so for two examples, orthogonal matrices and low rank matrices.</p> <h3 id=“orthogonal-matrices”>Orthogonal Matrices</h3> <div class=“lemma”> (3. Orthogonal Recovery) Let <span class=“math”>(x^*)</span> be an <span class=“math”>(m m)</span> orthogonal matrix. Then, letting <span class=“math”>()</span> be the set of all orthogonal matrices, <span class=“math”>(w(T_{}(x^*) ^{{m}2 -1})^2 )</span>. </div>

]</span> where <span class=“math”>(*m)</span> is the set of positive semi-definite <span class=“math”>(m m)</span> matrices. The positive semidefinte cone is self-dual (i.e. the polar cone <span class=“math”>(*m^* = *m)</span>) and hence contributes <span class=“math”>( )</span>.</p> <p>Putting this together, we have that <span class=“math”>(w()^2 + )</span>.</p> <h3 id=“low-rank-matrices”>Low-Rank Matrices</h3> <div class=“lemma”> (4. Low-rank Recovery) Let <span class=“math”>(x^ )</span> be a <span class=“math”>(m_1 m_2)</span> rank-r matrix (<span class=“math”>(m_1 m_2)</span>). Then, letting <span class=“math”>()</span> be the set of unit-norm rank-one matrices, <span class=“math”>(w(T_{}(x^) ^{m*1 m_2 -1})^2 3 r (m_1 + m_2 - r))</span>. </div>

]</span></p> <h2 id=“conclusions”>Conclusions</h2> <p>I hope through this article I’ve made a case for atomic norms. The framework allows us to reduce the complexity of a recovery problem for arbitrary atomic sets to a Gaussian width computation. Computing the Gaussian widths can still be a hard problem (just like computing the Radermacher complexity), but they provide several useful properties that make these calculations easier. From the few examples I have seen, the Gaussian width calculation requires you to fairly mechanically decompose the tangent cone into spaces that are fairly easy to describe.</p>

*{*} = *i(A))</span>, where <span class=“math”>(*i(A))</span> is the i-th singular value of <span class=“math”>(A)</span>.</p> <p>In the rest of the article, I’ll describe the properties of the subgradients of a matrix norm, and the exact form for the nuclear norm.</p> <h1 id=“notation”>Notation</h1> <p>I will usually use capitalised variables for matrices, e.g. <span class=“math”>(A)</span>, boldface lower case variables for vectors <span class=“math”>()</span> and normal lower case variables for scalar constants. To keep the discussion simple, I’m going make the assumption that all my matrices are real-valued, though most of this discussion readily extends to any field. I will assume that if you know what a field is, you know how to make the mapping yourself.</p> <h1 id=“matrix-norms”>Matrix Norms</h1> <p>(Matrix norms)[http://en.wikipedia.org/wiki/Matrix_norm], <span class=“math”>(|A|)</span> have the following properties, most of which are analogous to the conventional vector norms you would have likely come across, for example the Euclidean norm, <span class=“math”>(|| = )</span> or the Manhattan norm, <span class=“math”>(|v| = |v*i| + |v_j| + |v_k|)</span>,</p> <ol> <li><span class=“math”>(|A| 0)</span> for any <span class=“math”>(A)</span> and <span class=“math”>(|A| = 0)</span> iff <span class=“math”>(A = 0)</span>.</li> <li><span class=“math”>(|c A| = c |A|)</span> for any <span class=“math”>(c)</span>.</li> <li><span class=“math”>(|A + B| |A| + |B|)</span> for any <span class=“math”>(A)</span> and <span class=“math”>(B)</span>. This is the triangle inequality that makes <span class=“math”>(||)</span> a norm.</li> <li>Some norms have the further “sub-multiplicative” property, <span class=“math”>(|AB| |A||B|)</span>, for all <span class=“math”>(A)</span>, <span class=“math”>(B)</span>.</li> </ol> <h1 id=“subgradients”>Subgradients</h1> <h1 id=“subgradients-of-a-matrix-norm”>Subgradients of a matrix norm</h1> <p>The inner product of every entry of two matrices, <span class=“math”>(A, B = *{i,j} A*{ij} B_{ij})</span> can be succinctly written down as <span class=“math”>()</span>; this is an easy fact that I’ll leave to you to verify. The plus side is that it lets us express the subgradient property in this following nice way.</p> <section class=“footnotes”> <hr /> <ol> <li id=“fn1”><p>This sort of vague definition has, of course, lead anything and everything to be called regularization.<a href=“#fnref1”>↩</a></p></li> </ol> </section>

<p>You stand here, solemn, silent, by my side,<br /> Whispering softly of a time gone by;<br />The waves of could’ves flow in with the tide,<br /> The wind murmuring should’ves with a sigh</p> <p>Somewhere in time, unknowing, unheeding, uncaring,<br /> it was I who jumped ship, making instead,<br />a choice of warmth temporary, more comforting then<br /> than this, the narrow river of reason you tread.</p> <p>Your lines were sharper, your principles ever<br /> more principled in a singular face.<br />Mistaken, was I, that they our thread would sever,<br /> while forgetting all those decisions misplaced.</p> <p>Dear Sir, upon this road I can’t return,<br /> Were that I had taken yours, old friend, and that<br />In your company I could partake, than this<br /> ghost’s. Sir, could it be that I you regret?</p> </div>

<p>There is, bubbling inside me,<br /> Something I must tell you.<br />But you seem busy;<br /> I’ll wait, no worries,<br />It’s just a matter of timing, you see,<br /> And patient waiting is no stranger to me.</p> <p>What needed doing is finally done,<br /> The ships have set sail. In the twilight,<br />You seem tired, justifiably so.<br /> I’ll wait, no worries,<br />The bubbles can bubble over -<br /> Nothing I haven’t handled before</p> <p>Waves come of the morning tide,<br /> Wash ashore dead wood, left at sea;<br />In your eyes, distress I see,<br /> Tell me of the distant shores<br />As I push you adrift knowing,<br /> One day, you’ll come back for me</p> <p>The water has finally rocked itself asleep<br /> And the calm that follows, bliss.<br />For once I see in you peace, solitary;<br /> However fraught this moment may be,<br />A lake amidst a vast ocean.<br /> In my silence, this silence I’ll share.</p> <p>Time will take its own course<br /> One day this ship will run aground<br />O’er bubbled waters<br /> There was once a thing bubbling inside of me<br />In a time since gone by,<br /> It was a matter of timing, you’ll see.</p> </div>