arun.chagantys.org

Arun Chaganty’s Personal Website http://arun.chagantys.org

Murmurs

<div class=“poem”>

<p>It’s a Friday evening, when you hear, the careless whispers<br />     Looking around, bewildered, you see but a cat’s whiskers<br />Turning ’round the bend, to find a voice you swear you’d heard,<br />   The street is empty, save for a fence and its lonely lonesome bird<br />Walking solemn, walking slight, a crackling cackle gives you a fright<br />   Yet, behind you there is naught a sight?<br />Turn around, whence you came,<br />    sprint a sprint, for walking’s lame,<br />And as you stop to catch your breath,<br />    there’s only silence upon your wreath.<br />Shut the door and door the shutters,<br />     no more will you be troubled by its stuttering stutters.<br />And when the night is nigh,<br />      beneath your covers, sigh a sigh.<br />In your dreams you’ll see the truth,<br />    and make believe what was, was for sooth!<br />What made the murmurs murmur,<br />    was just the zoo’s giant le(r)mur<br />And the mumbling mumbles,<br />     oh, a baby with the grumbles<br />Only one thing could’ve made those cantankerous peals,<br />     the laughter, it must’ve come from the circus seals!<br />Now that that’s been put to rest,<br />     it’s time this mattress was put to test!</p> </div>

Thu, 05 Feb 2015 00:00:00 -0500 http://arun.chagantys.org/literature/2015/02/05/murmurs.html http://arun.chagantys.org/literature/2015/02/05/murmurs.html

And so it begins…

<div class=“left”> <img src=“/images/blog/2014-12-02-stuff.jpg” alt=“Stuff.” /> </div>

<div class=“right poem”>

<p>With neither a whimper nor a bang,<br />       but the steady accumulation of “stuff”<br />Encroaching on all things. With a pang,<br />  surrounded on four sides, we cannot rebuff<br />and must accept at long last:<br />A marriage is happening and is already almost past.</p> </div>

<div style=“clear:both;” />

<p><em>For the joyous occasion of my brother’s marriage.</em></p> Tue, 02 Dec 2014 00:00:00 -0500 http://arun.chagantys.org/literature/2014/12/02/it-begins.html http://arun.chagantys.org/literature/2014/12/02/it-begins.html

Fall.

<div class=“left”> <img src=“/images/blog/2014-11-12-fall.jpg” alt=“Leaves from trees” /> </div>

<div class=“right vcenter” style=“height:200px;”>

<p>Hark ye the fall! And minutes drop from high<br />  They twirl and twang as they gently settle<br />and quietly amongst their friends rustle<br />  and remember till winds blow them aside.</p> </div>

Wed, 12 Nov 2014 00:00:00 -0500 http://arun.chagantys.org/literature/2014/11/12/fall.html http://arun.chagantys.org/literature/2014/11/12/fall.html

Bright Shoes

<p>These white shoes were a terrible idea; they didn’t stay white for long. Look, how mud encroaches from the sides, advancing its hidden campaign every day. Soon, I will be surrounded, and with no Russia in this young rubber to keep wear at bay, it’ll be an easy victory. But while they last, these shoes feel good. And they came cheap.</p> <p>4 minutes into this run and I can already see the future. It happens as I caress the soft dirt below me, feeling the slightest undulations through my thin soles. Life is full of ups and downs, what matters is how you approach them. Am I tilted forward at the right angle, or are my priorities askew? I should have bent my knees there; it’s easier for me to land through the heel but ten years from now, I’ll regret taking the easy road. I am my harshest critic and most intimate confidante at once. With every step, the ground brushes against the crystal ball at my feet, and all that I will yet feel is laid bare before me.</p> <p>Right now, I feel invincible. I could go on and on, without pause nor break. There’s a very clear phase transition when you run. When you start, your body refuses to cooperate. You push forward purely on sheer willpower. Don’t be fooled though; it’s simply a rite of passage because if you push on enough you’ll reach a state of zen. Once there, your legs will move forward of their own accord and leave your mind free to wander. It doesn’t last for ever, this zen, but each time your run, it stays for longer. I imagine that if you’ve run enough, the zen will never go away, and you’ll finally be able to catch the horizon.</p> <p>When did I first get this spiritual about running? If I had to point at a time, it’d probably be when I ran my first marathon in Monterey. It was a beautiful, quaint town sprawled out upon the map, tanning in the Californian summer. There is an image captured in my head. It’s really a couple that have coalesced into one. In any case, while running, I stared out to my left where a sharp cliff overlooks the Pacific ocean. On the rocky shores below, a herd of dusky black seals form a demure face with velvety skin. The sparkling blue waters wrap this imaginary person in a blue evening gown, embroidered with bright white seagulls. Her arms are stretched out along the coast, laced with flowers of every which colour, yellows, whites, greens and a most curious purple. I was never the kind for love at first sight, least of all with figments of my imagination, but I can’t describe it otherwise. I’d fallen head over heels, all but literally.</p> <p>Ah, the air, it is cool. The freshness should be satisfying, but however deeply I inhale, it’s not enough. There is a light drizzle now and that unique scent of wet mud, petrichor, wafts into the air. If no one screws up, sixty years from now, we’ll be in space. Sure, we’ll simulate rain, but replicating the complex ecosystem that leads to this smell just won’t be economically viable. Instead, we’ll all just sniff petrichor perfume like glue and get high on nostalgia.</p> <p>Wait, now, I smell something different; equally fresh, but it stirs up different memories. There are delicate floral undertones; vaguely familiar but out of place. I’m instantly suspicious of this reality I perceive. Honestly, do we really sense reality? Or do we truly control our own actions? Isn’t it but the flow of chemicals, puppet strings a million years in the making? One day, we’ll have mastery over those strings. Can you ponder what we’ll ponder or imagine what we’ll imagine?</p> <p>Let me tell you a secret. I’m not imagining what fascinating possibilities lie in computer-aided dreams; I’m trying to change the topic to avoid my emotions. The sweet fragrance of her hair; her hair, soft yet ticklish to the touch. It’s all product, she tells me; why can’t you take a compliment at face value? Her head rests on my shoulder and I feel… calmness. Time is unhinged. I don’t know whether it is passing slow or fast or even at all. I’d like to be frozen in time. Someone else told me that once.</p> <p>I’ve a lot of memories. They collect like cups in a shelf. Sometimes you buy them, other times someone gives them to you. In either case, once you have them, you’re sunk. They’ll keep filling your shelves until something breaks. And then you’ll have to clean up the shattered remnants of a cup you may or may not have been responsible for.</p> <p>Unfortunately, time and tide stop for no man. And here I am, running away from a filled cupboard. And honestly, I can’t stop now. For better or for worse, running is a drug. Once you’re high, your aches just fade away. Your back straightens, your breath deepens and you feel just lighter from your head to your toes, often through your bowels. Your mood perks up, and if you’ve snuffed enough, you’re guaranteed a religious experience™. It even comes with it’s own set of withdrawal symptoms. A few days without a fix, your legs stiffen, your once sumptuous behind bulges blobberously and your stomach feels forever forward of it’s rightful place. Is there anything we won’t get addicted to? It’s a part of human nature. We throw out chains to every single thing we do, to every single person we meet. Some chains are just stronger than others.</p> <p>I’m getting closer to the shore. The tree cover is growing more sparse, but everything is two shades more vibrant a green than before the rain. A lot of birds are perched in the branches, taking shelter. Every now and then they chirp a sweet shrill song that reminds me of “To Kill a Mockingbird”. The variety is amazing; I’m not used to the brilliant reds and blues where I grew up in a dusty, dirty Indian city. Chidiya is the Hindi word for birds, and just like in English, also a euphemism for pretty girls. Chi-di-ya, it has a nice ring to it. Sort of like A-li-ya. It’s very apt.</p> <p>Oh dear, I’ve done it now. Aliyah, Aliyah, I can’t help but repeat her name, letting the syllables roll gently off my tongue, savouring the sound with a guilty pleasure. What can I say? I wanted to give in, to indulge myself with this warm feeling. It’s been a good run after all. But what foolish teenager-y. This is an infatuation gone rogue, taken flight upon wax wings towards the sun, impervious to the jagged rocks below.</p> <p>Oh, this will not end well. Have you ever felt sure you would regret doing something but couldn’t help yourself? It’s like somewhere in our ancestry we crossed genes with lemmings with their crazy urge to throw themselves off cliffs. Here I am, young, filled with a boiling passion and unjustified optimism. I hardly know this woman, yet all my guards are down. I knot myself in all sorts of ways, tricking my better judgement through fragile arguments. “Yes, I’ve torn down my walls for her, but really, I’ve torn my walls down for just about everyone and I’m a new open person, see?” “On scout’s oath, I promise I’ll keep my self respect and be as equals, just you wait and watch”. How very cute, but I know, lurking in the depths is a desire to fold. A weak moment and I’ll give an arm and a leg. This is such a bad idea.</p> <p>Still, Aliyah, what a pretty name.</p> <p>Now that I’ve bluntly ignored my spider senses, self-doubt crawls up my back. “She doesn’t feel the same”, the tiny gnomes inside me whisper. It’s such a predictable story it’s probably a Bollywood movie. I go through the motions, taking a deep breath and reminding myself, I’m not that special; everyone feels like you do. In the interest of self-preservation, I think I should work out a repeatable cure to my emotional foibles. Maybe I’ll even have a cute acronym to remember in case of emergencies. The truth is, sometimes, you just need to accept these feelings, face them, and watch silently as they pass through, leaving only you behind. When I’m feeling ballsy, I even wave goodbye. Most of the time, I don’t know whether to smother them with reason or not.</p> <p>It’s almost the end of the road. Cliffs line this last stretch. The waves smash applause into the rock-face; I don’t know if it’s for the running or for getting past this little emotional hiccup. In either case, I thank them. Peering over the cliff, I remember Monterey and am filled with a sense of déja vu. End in sight, something primordial is unleashed in me and I break into a full sprint. My mind, already reverted to that of a Neanderthal, caves in to the desire to voice those three magical syllables again. The sloshing of the waves echoes back to me with a roar. I am satisfied.</p> Wed, 18 Sep 2013 00:00:00 -0400 http://arun.chagantys.org/literature/2013/09/18/bright-shoes.html http://arun.chagantys.org/literature/2013/09/18/bright-shoes.html

On Atomic Norms <p>How many <em>linear measurements</em> do you need to (efficiently) recover a low rank matrix? What about a sparse vector or an orthogonal matrix? Given that we know our object of interest has some ‘structure’, can we answer this question in a general manner?</p> <p>In this article, I will show you one approach to do so; regression using atomic norms. Most of the material I will cover was presented in the paper, <a href=“http://arxiv.org/abs/1012.0621”>“The Convex Geometry of Linear Inverse Problems” by Venkat Chandrasekaran, et. al</a>.</p> <p>I will begin by describing the problem of structured recovery with a few motivating examples. I will then review some facts about convex geometry and what atomic norms are in this context. Next, I will show how these properties translate to sharp recovery guarantees, instantiating the framework with two examples; the recovery of low-rank matrices and orthogonal matrices.</p> <p>There are several more examples described in the paper that I will not talk about. Notably, I will <em>not</em> cover the details about theta bodies, approximations that the authors present, which trade off sample complexity for computational efficiency.</p> <p><strong>Note.</strong> Unfortunately, I have not been able to manipulate Pandoc and MathJax to do equation and theorem referencing. A PDF version of this article can be found <a href=“http://arun.chagantys.org/files/blog/atomic-norms.pdf”>here</a>.</p> <p><strong>Edit (27th May, 2013)</strong>: I had used <span class=“math”>()</span> in several places where I had really meant <span class=“math”>()</span>.</p> <h2 id=“motivation-recovering-under-determined-objects-using-structure”>Motivation: Recovering under-determined objects using structure</h2> <p>The problem of recovering sparse but high-dimensional vectors from data (“compressed sensing”) has seen recent success in applications to <a href=“http://nuit-blanche.blogspot.com/2009/09/cs.html”>genotyping</a>, <a href=“http://www.theengineer.co.uk/news/news-analysis/picture-of-health/1001485.article”>medical imaging</a> and comes up commonly when dealing with sensor data. Similarly, the recovery of low-rank matrices and tensors has been used to <a href=“http://www.ncbi.nlm.nih.gov/pubmed/15734364”>analyze fMRI data</a>. It also comes up in the recovery of signal from only the magnitude measurements (i.e. without phase), using the <a href=“http://arxiv.org/abs/1109.4499”>PhaseLift algorithm</a>.</p> <h3 id=“the-mixture-of-linear-regressions”>The Mixture of Linear Regressions</h3> <p><img src=“http://arun.chagantys.org/images/blog/atomic-norms/mlr-cartoon.png” /></p> <p>Personally, I came to be interested in this formulation from my work on applying the method of moments to the mixture of linear regressions, with Percy Liang. In that algorithm, we provided a consistent estimator for the discriminative model where we observe data <span class=“math”>((y_i, x_i))</span>, where <span class=“math”>(y_i = h^T xi + i)</span> and <span class=“math”>(h)</span> is one of <span class=“math”>(1, 2, k)</span>; we do not observe which and do not know what the <span class=“math”>(h)</span> are. Of course, <span class=“math”>(x_i)</span> could actually be features so the data could describe curves instead of lines, as depicted in the picture above.</p> <p>The problem is to learn the <span class=“math”>({h}{h=1}^k)</span>, and we do so by observing that <span class=“math”>(y_i^2)</span> are linear measurements of <span class=“math”>(h[_h^{2}])</span>, etc. After recovering these moments, we apply <a href=“http://arxiv.org/abs/1203.0683”>standard techniques</a> to recover <span class=“math”>({h})</span>. In these regression problems, we exploited the low-rank structure to recover the moments with just <span class=“math”>(O(kd))</span> samples instead of potentially <span class=“math”>(O(d^3))</span> samples!</p> <p>The results of this paper allow us, conceptually at least, to extend this approach to efficiently recover the moments from some linear measurements.</p> <h2 id=“the-structured-recovery-problem”>The Structured Recovery Problem</h2> <p>I will now talk about the exact problem we are looking at. This needs some definitions.</p> <p>Suppose that the object (matrix, tensor, etc.) we wish to recover is the convex combination of a few “atoms” in the <strong>atomic set</strong> <span class=“math”>()</span>, <span class=“math”>[ ]</span></p> <p>We can define the <strong>atomic norm</strong> of this set to be, <span class=“math”>[

]</span> Intuitively, this function tells us how much we need to scale <span class=“math”>()</span> so that it’s convex hull contains <span class=“math”>(x)</span>. In principle, <span class=“math”>(||{})</span> is only a norm when <span class=“math”>()</span> is “centrally symmetric” (<span class=“math”>(a -a )</span>). The results in the paper extend to non-centrally symmetric <span class=“math”>()</span> as well, so we will just assume that <span class=“math”>(||{})</span> is a valid norm here.</p> <!– Also, we will assume that \(\mA\) is convex, since we will only be interested in the properties of the convex hull of \(\mA\), \(\conv(\mA)\). –>

<p><strong>Examples.</strong> Two standard choices for <span class=“math”>()</span> are the set of 1-sparse vectors (for sparse recovery) and set of rank-one matrices (for low rank recovery). The corresponding norms are, as expected, the <span class=“math”>(L_1)</span>, <span class=“math”>(||{1})</span>, norm and the nuclear norm, <span class=“math”>(||{})</span>.</p> <p>Let <span class=“math”>()</span> be a linear measurement operator. We wish to ask, how many linear measurements <span class=“math”>(y = x^)</span> are required to exactly recover <span class=“math”>(x^*)</span> from the convex program, <span class=“math”>[ ]</span></p> <p>The approach also handles the case where the observations are noisy, <span class=“math”>(y = x^* + )</span>, where <span class=“math”>(||||_2 )</span>; <span class=“math”>[ ]</span> In this case, we will ask instead for the number samples required so that <span class=“math”>(||x - x^*|| )</span> for any <span class=“math”>()</span>.</p> <h3 id=“convex-geometry-prelimnaries”>Convex Geometry Prelimnaries</h3> <p>Before we proceed, I will need to review a few simple but crucial concepts from convex geometry.</p> <p><img src=“http://arun.chagantys.org/images/blog/atomic-norms/cones.png” /></p> <p>The <strong>tangent cone</strong>, <span class=“math”>(T_{}(x))</span>, gives the directions that <em>decrease</em> <span class=“math”>(||_{})</span>, <span class=“math”>[ ]</span> Similarly, we can define the <strong>normal cone</strong>, <span class=“math”>[ ]</span> Finally, the <strong>polar</strong> of a cone <span class=“math”>(C^*)</span> is the set of vectors that are obtuse with respect to all vectors in <span class=“math”>(C)</span>. <span class=“math”>[ ]</span></p> <p>Note that the polar cone of the tangent cone is the normal cone, and vice versa.</p> <p><strong>Examples.</strong> In the <span class=“math”>(L_2)</span> case, i.e. the elements <span class=“math”>()</span> have no structural constraints, the tangent cone is the half space, and the normal cone is the normal of this halfspace. In the <span class=“math”>(L_1)</span> case, the tangent cone is the cone extending from edges of the <span class=“math”>(l_1)</span> ball; the normal cone is the cone bounded by the normals of these two (shown in figure).</p> <h2 id=“recovering-x”>Recovering <span class=“math”>(x^)</span></h2> <p>We have finally come to the meat of this article; in this section, we will study how the properties of the atomic set, <span class=“math”>()</span>, relate to difficulty of recovering <span class=“math”>(x^)</span>.</p> <p>The first result should illuminate why this approach will work.</p> <div class=“lemma”> (1. Exact Recovery) <span class=“math”>(x^*)</span> is a unique solution to the exact optimization problem iff the tangent cone of the atomic norm at the minimizer and the null-space of the observation operator are transverse, <span class=“math”>[

]</span> </div>

<p>Recall that the tangent cone <span class=“math”>(T_{}(x^))</span> gives the descent directions of the objective function (<span class=“math”>(||{})</span>) and if <span class=“math”>(())</span>, then <span class=“math”>(= 0)</span>. Thus, we could move <span class=“math”>()</span> along these directions and reduce the objective <em>without violating the constraint <span class=“math”>(y = x)</span></em>.</p> <p>Now, when we have noisy measurements, we need a stronger condition; that the measurement operator have a “noticeable” projection on the descent directions in <span class=“math”>(T{}(x^))</span>.</p> <div class=“lemma”> (2. Noisy Recovery) Let <span class=“math”>(y = x^* + , || )</span> be <span class=“math”>(n)</span> noisy measurements and <span class=“math”>(x)</span> be the optimal solution to the noisy optimization problem. If, for all <span class=“math”>(z T_{}(x^*))</span>, the measurements are bounded below, <span class=“math”>[

]</span> then <span class=“math”>(| x - x^* | )</span>. </div> <div class=“proof”>

<p>As <span class=“math”>(x)</span> minimizes <span class=“math”>(||{})</span> in the program, we have that <span class=“math”>(|x|{} |x^|{})</span> and thus that <span class=“math”>(x - x^* T{}(x^))</span>. Using the linearity of <span class=“math”>()</span> and the triangle inequality, we derive that, <span class=“math”>[

]</span> where the first term is upper bounded by <span class=“math”>()</span> by the optimization program, and the second term is upper bounded by <span class=“math”>()</span> from assumptions on <span class=“math”>()</span>.</p> Finally, using the assumption on <span class=“math”>()</span>, <span class=“math”>(|(x - x^*)|{} |x - x^*|{})</span>, giving us the finaly result. </div>

<p><span class=“math”>()</span> is a random quantity, so we will have to show that with sufficient samples, the transversity conditions hold with high probability. Note that the assumptions for Lemma 2 subsume those for Lemma 1; it is sufficient for us to estimate a lower bound on <span class=“math”>[

]</span> This quantity will be referred to as the <strong>minimum gain</strong> of the measurement operator restricted to the cone <span class=“math”>(T_{A}(x^*))</span>.</p> <p><strong>Aside: Atomic norms are the “best” convex heuristic.</strong> One intuition to use the atomic norm is as follows. A basic condition we’d like for heuristic penalty is that it be constant for each atom in the set <span class=“math”>()</span>. Consequently, <span class=“math”>(a - a&#39;)</span> should be a descent direction for any <span class=“math”>(a, a&#39; )</span>. Requiring that the penalty be convex implies that the set of descent directions <span class=“math”>({a - a&#39; a, a&#39; })</span> should be a cone. This is precisely the tangent cone at <span class=“math”>(a )</span> with respect to <span class=“math”>(())</span>.</p> <h3 id=“gaussian-widths-and-bounding-minimum-gain”>Gaussian widths and bounding minimum gain</h3> As we noted above, the number of samples required for recovery are determined by minimum gain of <span class=“math”>()</span>. We will now characterize this quantity in terms of the <em>Gaussian width</em> of the tangent cone, defined below, <div class=“definition”> (Gaussian Width) The Gaussian width of a set <span class=“math”>(S ^p)</span> is, <span class=“math”>(w(S) = {g (0, In)} [_{zS} g^T z ])</span>. </div>

<p>Of course, because of the linearity of <span class=“math”>()</span>, the minimum gain is independent of the length of <span class=“math”>(z)</span> so, bounding <span class=“math”>(||z||{})</span> on the unit sphere, <span class=“math”>(= T{}(x^*) ^{p-1})</span>, is sufficient to bound <span class=“math”>(||z||{})</span> on <span class=“math”>(T{})</span>.</p> <a href=“http://link.springer.com/chapter/10.1007%2FBFb0081737?LI=true”>Yehoram Gordon</a> (<a href=“http://www.math.uiuc.edu/~mjunge/Comps/Gordonm.pdf”>pdf</a>) proved the following key result on when a random subspace escapes a “mesh” in <span class=“math”>(^n)</span>, <div class=“theorem”> (1. Gaussian Width) Let <span class=“math”>(^{p-1})</span> and <span class=“math”>(: ^p ^n)</span> be a <em>random map</em> with i.i.d. standard Gaussian entries. Then, <span class=“math”>[

]</span> where <span class=“math”>(n)</span> is the expected length of a <span class=“math”>(n)</span>-dimensional Gaussian vector, <span class=“math”>({g (0, I_n)}[ ||g||_2 ])</span>. </div>

<p>The proof follows from a generalization of the Sudakov-Fernique inequality. It is also useful to note that <span class=“math”>(k = ()/ ())</span>, which is tightly bounded, <span class=“math”>( k )</span>.</p> <h3 id=“main-theorem-exponential-convergence-to-x”>Main Theorem: Exponential Convergence to <span class=“math”>(x^)</span></h3> <p>We are finally ready to state and prove the main theorem which shows that we need about <span class=“math”>(w()^2)</span> samples to exponentially convergence to <span class=“math”>(x^)</span>. Theorem 1 gives us on the minimum gain that holds in expectation. To extend it to the finite sample regime, we’ll exploit the property that <span class=“math”>({z } ||z||2)</span> is Lipschitz (with constant <span class=“math”>(1)</span>) and use the concentration of Gaussian measures to show convergence.</p> <div class=“theorem”> <p>(2. Recovery) Let <span class=“math”>()</span> be a random map with rows drawn i.i.d. from <span class=“math”>((0, I_n))</span> and <span class=“math”>(= T_{}(x^) ^{p-1})</span>. Then</p> <ol type=“1”> <li><p>Given measurements <span class=“math”>(y = x^)</span>, then <span class=“math”>(x^)</span> is the unique solution of the the exact convex program with probability at least <span class=“math”>(1 - ( -(n - w())^2 ))</span>, if <span class=“math”>(n w()^2 + 1)</span>.</p></li> <li>Given measurements <span class=“math”>(y = x^* + , ||2 )</span>, then the solution <span class=“math”>(x)</span> of the the noisy convex program satisfies <span class=“math”>(| x^ - x |2 )</span> with probability at least <span class=“math”>(1 - ( -(n - w() - )^2 ))</span>, if <span class=“math”>(n )</span>. </div> </li> </ol> Note that in both cases, the convergence rates are exponential <span class=“math”>(O((-n)))</span>. <div class=“proof”>

<p>This follows directly by using the property that for any Lipschitz function, with constant <span class=“math”>(L)</span>, <span class=“math”>[ ]</span></p> <p>To see that the map <span class=“math”>({z } |z|2)</span> is Lipschitz, observe, <span class=“math”>[ ]</span></p> Thus, we have, <span class=“math”>[

]</span> when <span class=“math”>(_n w() - )</span>. Plugging in <span class=“math”>(= 0)</span>, we get the first condition as well. </div>

<p><strong>Relationship to Radermacher complexity.</strong> In the empirical risk minimization framework, the sample complexity is almost entirely defined by the <a href=“http://en.wikipedia.org/wiki/Rademacher_complexity”>Radermacher complexity</a>, which is analogous to the Gaussian width, but with expectations taken over binary random variables instead of a Guassian.</p> <h3 id=“properties-of-gaussian-widths”>Properties of Gaussian Widths</h3> <p>The main theorem shows that the sample complexity is pretty sharply bounded by the Gaussian width <span class=“math”>(w())</span>. The Gaussian width of a set is a very well behaved object with many useful properties. We will use these properties in the applications section, but you should feel free to skip the details here for now.</p> <h4 id=“relationship-to-intrinsic-volume”>Relationship to Intrinsic Volume</h4> <p> A large number of properties follow from the fact that <span class=“math”>(w)</span> is a <a href=“http://en.wikipedia.org/wiki/Valuation_(measure_theory)”>valuation</a>, an analogue of a measure. To show this, we’ll use a clever fact that an isotropic Gaussian vector can be separated into two independent parts, length and direction. The direction component integrates over the unit sphere. <span class=“math”>[ ]</span> where <span class=“math”>(b(S))</span> is the <strong>intrinsic volume</strong> of a set (a deterministic property). It turns out (and this is intuitive) that <span class=“math”>(, u)</span> is a Haar measure, giving us the following properties for free.</p> <ul> <li><span class=“math”>(w(S))</span> invariant to translations and unitary transformations.</li> <li>Scaling: If <span class=“math”>(w(tK) t w(K), t &gt; 0)</span>.</li> <li>Monotonic: If <span class=“math”>(S_1 S_2 ^p)</span>, <span class=“math”>(w(S_1) w(S_2))</span>.</li> <li>If <span class=“math”>(w( S_1 S_2 ) + w( S_1 S_2 ) = w(S_1) + w(S_2))</span>.</li> <li><span class=“math”>(w( S ) = w((S)))</span> (because of the <span class=“math”>()</span>).</li> <li>If <span class=“math”>(S = S_1 S_2)</span>, <span class=“math”>(w( S ^{p-1} ) w( S_1 ^{p-1} ) + w( S_2 ^{p-1} ))</span>.</li> </ul> <h4 id=“other-properties”>Other Properties</h4> <p></p> <p>There are a couple of other cool properties that I won’t prove here</p> <ul> <li>If <span class=“math”>(V)</span> is a vector space, <span class=“math”>(w( V ^{p-1} ) = )</span>.</li> <li><span class=“math”>(w(C ^{p-1}) {g (0, Ip)}[ (g,C^*) ])</span> if <span class=“math”>(C)</span> is a non-empty convex cone in <span class=“math”>(^p)</span>. The proof follows from convex duality between <span class=“math”>(_{z C} g^T z)</span> and <span class=“math”>((g,C^))</span>.</li> <li>A simple corollary; <span class=“math”>(w(C ^{p-1}) + w(C^ ^{p-1}) p)</span>.</li> <li>Let <span class=“math”>((S,))</span> be the <span class=“math”>()</span>-covering number of <span class=“math”>(S)</span>. Then, by Dudley’s inequality, <span class=“math”>[

]</span></li> </ul> <h2 id=“applications”>Applications</h2> <p>We have covered some fairly complex results in this article. To summarize, we showed the convex optimization problem exponentially converges after getting <span class=“math”>(O(w()^2))</span> samples, where <span class=“math”>(w())</span> is the Gaussian width of the tangent cone of the atomic norm <span class=“math”>(T_{}(x^))</span> at the unit sphere. The Gaussian width has several useful compositional properties (just like the Radermacher complexity). To instantiate the framework, we just need to bound the Gaussian width of the atomic set. In this section, we will do so for two examples, orthogonal matrices and low rank matrices.</p> <h3 id=“orthogonal-matrices”>Orthogonal Matrices</h3> <div class=“lemma”> (3. Orthogonal Recovery) Let <span class=“math”>(x^)</span> be an <span class=“math”>(m m)</span> orthogonal matrix. Then, letting <span class=“math”>()</span> be the set of all orthogonal matrices, <span class=“math”>(w(T_{}(x^*) {m2 -1})^2 )</span>. </div>

<p>Let’s start by defining the atomic norm which should be <span class=“math”>(1)</span> for all elements in <span class=“math”>()</span> and <span class=“math”>(1)</span> for everything in the convex hull <span class=“math”>()</span>. Note that the set of orthogonal matrices is <em>not convex</em>. However, it is not hard to see that every matrix in the convex hull of all <span class=“math”>(m m)</span> orthogonal matrices has singular values <span class=“math”>(1)</span>; let <span class=“math”>(U)</span> and <span class=“math”>(V)</span> be two orthogonal matrices; <span class=“math”>(V = A U)</span>, where <span class=“math”>(A)</span> is also orthogonal. So, <span class=“math”>[ ]</span> where <span class=“math”>(||U||2 = 1(U))</span> is the spectral norm, the largest singular value. We have just shown that the spectral norm is an atomic norm!</p> <p>Using the property that <span class=“math”>(w)</span> is rotationally invariant, we just need to consider the tangent cone at <span class=“math”>(x^* = I)</span>. <span class=“math”>[ ]</span></p> <p>Every matrix <span class=“math”>(M)</span> can be represented as the sum of a symmetric (<span class=“math”>(A=A^T)</span>) and skew-symmetric matrix (<span class=“math”>(A=-A^T)</span>), so we can partion <span class=“math”>(T_{}(I))</span> into these two spaces. The Lie algebra of skew-symmetric matrices is the tangent space of the orthogonal group <span class=“math”>((m))</span>, so we’ll have to consider full subspace. Using the symmetry property, this subset is isomorphic to the <span class=“math”>()</span> dimensional vector space that defines the entries above (or below) the diagonal.</p> <p>The remaining component lies in subspace of symmetric matrices, <span class=“math”>[

]</span> where <span class=“math”>(m)</span> is the set of positive semi-definite <span class=“math”>(m m)</span> matrices. The positive semidefinte cone is self-dual (i.e. the polar cone <span class=“math”>(m^* = m)</span>) and hence contributes <span class=“math”>( )</span>.</p> <p>Putting this together, we have that <span class=“math”>(w()^2 + )</span>.</p> <h3 id=“low-rank-matrices”>Low-Rank Matrices</h3> <div class=“lemma”> (4. Low-rank Recovery) Let <span class=“math”>(x^)</span> be a <span class=“math”>(m_1 m_2)</span> rank-r matrix (<span class=“math”>(m_1 m_2)</span>). Then, letting <span class=“math”>()</span> be the set of unit-norm rank-one matrices, <span class=“math”>(w(T_{}(x^) ^{m1 m_2 -1})^2 3 r (m_1 + m_2 - r))</span>. </div>

<p>Once again, the set of rank-1 matrices is not convex; the sum of two rank-1 matrices is rarely going to stay rank-1! The spectral norm, <span class=“math”>(||M||* = {i=1}^{m} i(M))</span> (i.e. the sum of singular values) is a suitable atomic norm for this set. Firstly, <span class=“math”>(||)</span> is 1 for every element in <span class=“math”>()</span> and is also <span class=“math”>(1)</span> for every element in <span class=“math”>()</span>. While there may be other choices for <span class=“math”>(||{})</span>, the spectral norm is a natural choice to minimize the size of the tangent cone, as we’ll shortly see.</p> <p>Let <span class=“math”>(x^* = UV^T)</span>, where <span class=“math”>(U ^{m1 r})</span>, <span class=“math”>(^{r r})</span> and <span class=“math”>(V ^{m_2 r})</span>. The tangent cone at <span class=“math”>(x^)</span> is, <span class=“math”>[ ]</span></p> <p>To bound the width of this set, we will proceed by bounding the distance to the normal cone. The normal cone is described by the sub-differentials at <span class=“math”>(x^*)</span>, <span class=“math”>[ ]</span> where <span class=“math”>(||W||^*{})</span> is the <em>dual norm</em>; in our case it’s just the operator norm <span class=“math”>(||2)</span>. Let <span class=“math”>()</span> be directions in the subspaces of <span class=“math”>(U)</span> and <span class=“math”>(V)</span> and <span class=“math”>(^{})</span> be directions orthogonal to <span class=“math”>(U)</span> and <span class=“math”>(V)</span>.</p> <p>For any matrix <span class=“math”>(G)</span>, a point in normal cone is the projection, <span class=“math”>(||P_{{}}(G)||2 UV^T + P{{}}(G))</span>. Finally, <span class=“math”>[ ]</span> The first term contributes <span class=“math”>(r(m_1 + m_2 - r))</span> because the dimension of <span class=“math”>(P_{})</span> is <span class=“math”>(r(m_1 + m_2 - r))</span>. The second term is calculated by observing that <span class=“math”>(P_{^{}}(G))</span> is an isotropic Gaussian <span class=“math”>((m_1 - r)(m_2 -r))</span> matrix. We have good concentration results on the singular values of such random matrices, <span class=“math”>[ ]</span> It’s not hard to show from here that <span class=“math”>([||P_{}(G)||_22] ( + )^2 + 2)</span>.</p> <p>Putting this all together, we get our desired result; <span class=“math”>[

]</span></p> <h2 id=“conclusions”>Conclusions</h2> <p>I hope through this article I’ve made a case for atomic norms. The framework allows us to reduce the complexity of a recovery problem for arbitrary atomic sets to a Gaussian width computation. Computing the Gaussian widths can still be a hard problem (just like computing the Radermacher complexity), but they provide several useful properties that make these calculations easier. From the few examples I have seen, the Gaussian width calculation requires you to fairly mechanically decompose the tangent cone into spaces that are fairly easy to describe.</p> Sat, 25 May 2013 00:00:00 -0400 http://arun.chagantys.org/technical/2013/05/25/on-atomic-norms.html http://arun.chagantys.org/technical/2013/05/25/on-atomic-norms.html

On Regression

<p>In this article, I will try to demonstrate why regression works at all (particularly with different regularizations) and how regularization should be controlled to have convergent answers. I will operate in the fixed design setting, i.e. when bounds are assumed on the input as opposed to assuming the random distribution from whence it came (the random design setting).</p> Mon, 13 May 2013 00:00:00 -0400 http://arun.chagantys.org/technical/2013/05/13/on-regression.html http://arun.chagantys.org/technical/2013/05/13/on-regression.html

A Day in Monterey

<h1 id=“ive-come-here-to-find-some-answers-but-find-questions-instead.”>I’ve come here to find some answers, but find questions instead.</h1> <p>I sit upon a bench, my feet, they itch to leave the world behind. Life these days; so much choice, yet all kitsch And today, when to answers I’ve devoted my mind, starring into the distance but asks me questions to twitch. Oh, what I’ll prove, my fingers eagerly twitch I’ll find the answers, apply my mind But not today, I’m lost in a X, thoughts are kitsch</p> <p>My jacket flutters in the cool breeze. In the calm serenity, meditation beckons me. The theorems I set out to prove will yet find their answers. Here, now, I lose myself to questions.</p> <h1 id=“theme-slice-into-these-questions-from-looking-at-the-world”>(Theme: slice into these questions from looking at the world)</h1> <h1 id=“birds”>Birds</h1> <h1 id=“squirrels”>Squirrels</h1> <h1 id=“waves-and-plants”>Waves and plants</h1> <p>A fleet of birds swoop down, kissing the waves as they fly by A white albatross emerges from the horizon long wings spread out, a single flap carries it across the shore whilst its caws roll in before receding.</p> <p>A squirrel hops around in circles, Its tail flits from side to side. Here it nibbles on grass, there it sniffs questioningly, searching for something, I haven’t the voice to ask “What?”</p> <p>The bright blue bed of ice plants reminds me of an alien world.</p> <p>All in the quiet backdrop of the sea Gentle waves breaking the silence in the gentlest of ways A whisper to keep you calm A whisper to that keeps you alarmed.</p> <p>I forget the theorems I had set out to prove But am reminded same. static equillibrium. All the down at long last. Dynamic equillibirum, and of this universal truth. To learn, to understand, there is always a question. Be it in how the waves work, or why the algorithms settle And always the quest for answers.</p> <ul> <li>Theme, questions – nature – questions.</li> </ul> Sat, 27 Apr 2013 00:00:00 -0400 http://arun.chagantys.org/literature/2013/04/27/a-day-in-monterey.html http://arun.chagantys.org/literature/2013/04/27/a-day-in-monterey.html

The Subgradient of a Matrix Norm

<p>Suitably encouraged by friend and colleague, (Jacob Steinhardt)[http://jsteinhardt.wordpress.com] to write a technical post on recent work, I am going to talk about a simple derivation that I found some difficulty in finding online; the subgradient or subderivatives of a matrix norm, and more precisely that of the nuclear norm.</p> <h1 id=“the-big-picture”>The big picture</h1> <p>There are number of scenarios where you are looking for the right matrix to solve a problem while satisfying some constraints. To take a concrete example, albeit one that’s been beaten to death, the popular Netflix competition provided competitors with the ratings each user gave to the small set of movies he/she watched, with the task of predicting the user’s ratings for the remaining movies. This problem could be represented as an <span class=“math”>(N M)</span> matrix, <span class=“math”>(R)</span>, where <span class=“math”>(N)</span> and <span class=“math”>(M)</span> are respectively the number of users and movies. Each entry describes the rating a user gave for a particular movie. There are infinite matrices that could satisfy the given data; one possible scheme would be to fix those entries to given values and fill the rest of the matrix with randomness.</p> <p>Without having any information whatsoever on the unspecified entries of <span class=“math”>(R)</span>, we need to make some assumptions about the way users behave. A common assumption made is that the number of factors, <span class=“math”>(F)</span>, that differentiates which users like which movies is fairly small, maybe on the order of 100s. This sort of assumption allows you to <em>decompose</em> your <span class=“math”>(N M)</span> matrix into a <span class=“math”>(N F)</span> matrix, <span class=“math”>(U)</span>, describing these <span class=“math”>(F)</span> factors that each person has and a <span class=“math”>(F M)</span> matrix, <span class=“math”>(V)</span>, describing how those factors influence ratings of movies. Of course, you might recognise this to look suspiciously similar to the (singular value decomposition)[http://youtu.be/R9UoFyqJca8], <span class=“math”>(R = U V^T)</span>, where <span class=“math”>()</span> is a diagonal matrix whose entries scale relate the features in <span class=“math”>(U)</span> to those in <span class=“math”>(V)</span>. This is no doubt a naive model but the approach has worked extremely well in the NetFlix challenge and many similar problems.</p> <p>The optimization problem we’re looking at here looks for a matrix that is very similar to ratings we’ve seen, with an additional constraint that <span class=“math”>(F)</span> be small, which, now that we’ve seen the SVD picture, translates to requiring the rank of the matrix to be small. This turns out to be a non-smooth constraint which makes our problem a bit more interesting.</p> <p>Now, to get into the details of how we’d actually optimize this problem. The answer usually is gradient descent. However, when you use a non-smooth objective function, there are points where the gradient isn’t defined. This is a problem; gradients are clearly an integral enough part of gradient descent to figure in the name. In order to work around this obstacle, (people use a generalisation of gradients)[http://en.wikipedia.org/wiki/Subgradient_method], called (subgradients)[http://en.wikipedia.org/wiki/Subderivative], which I will briefly describe in the sequel.</p> <p>The second half of my title mentions matrix norms. A common part of any objective function in machine learning is a “regularization” term. Regularization plays the role of preventing the solution you get from your optimisation problem from “(overfitting)[http://en.wikipedia.org/wiki/Overfitting]” your data<sup><a href=“#fn1” class=“footnoteRef” id=“fnref1”>1</a></sup>. One of the most effective ways to regularize your solution is by placing a penalty on it for being “too large”. Using a norm helps your quantify that statement in a more mathematically precise way; you penalize your solution for having too large a norm. In our case, we’d like to penalize having too large a rank; which is equivalent to placing a limit on the number of non-zero singular values of <span class=“math”>(R)</span>, i.e. placing the <span class=“math”>(L_{0})</span> norm on the singular values. <span class=“math”>(L_0)</span> norms require you to search the combinatorial space of solutions, so a common relaxation that works is the <span class=“math”>(L_1)</span> norm, or the sum of the absolute values of the singular values. This is the nuclear norm or the trace norm, <span class=“math”>(|A|{*} = i(A))</span>, where <span class=“math”>(i(A))</span> is the i-th singular value of <span class=“math”>(A)</span>.</p> <p>In the rest of the article, I’ll describe the properties of the subgradients of a matrix norm, and the exact form for the nuclear norm.</p> <h1 id=“notation”>Notation</h1> <p>I will usually use capitalised variables for matrices, e.g. <span class=“math”>(A)</span>, boldface lower case variables for vectors <span class=“math”>()</span> and normal lower case variables for scalar constants. To keep the discussion simple, I’m going make the assumption that all my matrices are real-valued, though most of this discussion readily extends to any field. I will assume that if you know what a field is, you know how to make the mapping yourself.</p> <h1 id=“matrix-norms”>Matrix Norms</h1> <p>(Matrix norms)[http://en.wikipedia.org/wiki/Matrix_norm], <span class=“math”>(|A|)</span> have the following properties, most of which are analogous to the conventional vector norms you would have likely come across, for example the Euclidean norm, <span class=“math”>(|| = )</span> or the Manhattan norm, <span class=“math”>(|v| = |vi| + |v_j| + |v_k|)</span>,</p> <ol> <li><span class=“math”>(|A| 0)</span> for any <span class=“math”>(A)</span> and <span class=“math”>(|A| = 0)</span> iff <span class=“math”>(A = 0)</span>.</li> <li><span class=“math”>(|c A| = c |A|)</span> for any <span class=“math”>(c)</span>.</li> <li><span class=“math”>(|A + B| |A| + |B|)</span> for any <span class=“math”>(A)</span> and <span class=“math”>(B)</span>. This is the triangle inequality that makes <span class=“math”>(||)</span> a norm.</li> <li>Some norms have the further “sub-multiplicative” property, <span class=“math”>(|AB| |A||B|)</span>, for all <span class=“math”>(A)</span>, <span class=“math”>(B)</span>.</li> </ol> <h1 id=“subgradients”>Subgradients</h1> <h1 id=“subgradients-of-a-matrix-norm”>Subgradients of a matrix norm</h1> <p>The inner product of every entry of two matrices, <span class=“math”>(A, B = {i,j} A{ij} B_{ij})</span> can be succinctly written down as <span class=“math”>()</span>; this is an easy fact that I’ll leave to you to verify. The plus side is that it lets us express the subgradient property in this following nice way.</p> <section class=“footnotes”> <hr /> <ol> <li id=“fn1”><p>This sort of vague definition has, of course, lead anything and everything to be called regularization.<a href=“#fnref1”>↩</a></p></li> </ol> </section> Wed, 19 Dec 2012 00:00:00 -0500 http://arun.chagantys.org/technical/2012/12/19/subgradient-nuclear-norm.html http://arun.chagantys.org/technical/2012/12/19/subgradient-nuclear-norm.html

Memory and Forgetting

<p><em>(Dedicated to a Monsieur Le Coq)</em></p> <div class=“poem”>

<p>You stand here, solemn, silent, by my side,<br />  Whispering softly of a time gone by;<br />The waves of could’ves flow in with the tide,<br />  The wind murmuring should’ves with a sigh</p> <p>Somewhere in time, unknowing, unheeding, uncaring,<br />  it was I who jumped ship, making instead,<br />a choice of warmth temporary, more comforting then<br />  than this, the narrow river of reason you tread.</p> <p>Your lines were sharper, your principles ever<br />  more principled in a singular face.<br />Mistaken, was I, that they our thread would sever,<br />  while forgetting all those decisions misplaced.</p> <p>Dear Sir, upon this road I can’t return,<br />  Were that I had taken yours, old friend, and that<br />In your company I could partake, than this<br />  ghost’s. Sir, could it be that I you regret?</p> </div>

Mon, 05 Nov 2012 00:00:00 -0500 http://arun.chagantys.org/literature/2012/11/05/memory-forgetting.html http://arun.chagantys.org/literature/2012/11/05/memory-forgetting.html

Left Unsaid

<div class=“poem”>

<p>There is, bubbling inside me,<br />  Something I must tell you.<br />But you seem busy;<br />  I’ll wait, no worries,<br />It’s just a matter of timing, you see,<br />  And patient waiting is no stranger to me.</p> <p>What needed doing is finally done,<br />  The ships have set sail. In the twilight,<br />You seem tired, justifiably so.<br />  I’ll wait, no worries,<br />The bubbles can bubble over -<br />  Nothing I haven’t handled before</p> <p>Waves come of the morning tide,<br />  Wash ashore dead wood, left at sea;<br />In your eyes, distress I see,<br />  Tell me of the distant shores<br />As I push you adrift knowing,<br />  One day, you’ll come back for me</p> <p>The water has finally rocked itself asleep<br />  And the calm that follows, bliss.<br />For once I see in you peace, solitary;<br />  However fraught this moment may be,<br />A lake amidst a vast ocean.<br />  In my silence, this silence I’ll share.</p> <p>Time will take its own course<br />  One day this ship will run aground<br />O’er bubbled waters<br />  There was once a thing bubbling inside of me<br />In a time since gone by,<br />  It was a matter of timing, you’ll see.</p> </div>

Fri, 24 Aug 2012 00:00:00 -0400 http://arun.chagantys.org/literature/2012/08/24/left-unsaid.html http://arun.chagantys.org/literature/2012/08/24/left-unsaid.html