Counterintuitive Properties of High Dimensional Space (2018)

(people.eecs.berkeley.edu)

246 points | by nabla9 3 days ago ago

60 comments

  • gcanyon 3 days ago

    One that isn't listed here, and which is critical to machine learning, is the idea of near-orthogonality. When you think of 2D or 3D space, you can only have 2 or 3 orthogonal directions, and allowing for near-orthogonality doesn't really gain you anything. But in higher dimensions, you can reasonably work with directions that are only somewhat orthogonal, and "somewhat" gets pretty silly large once you get to thousands of dimensions -- like 75 degrees is fine (I'm writing this from memory, don't quote me). And the number of orthogonal-enough dimensions you can have scales as maybe as much as 10^sqrt(dimension_count), meaning that yes, if your embeddings have 10,000 dimensions, you might be able to have literally 10^100 different orthogonal-enough dimensions. This is critical for turning embeddings + machine learning into LLMs.

    • user070223 2 days ago

      That's what illustrated in the paper Toy Models of superposition

      https://arxiv.org/pdf/2209.10652

      • gcanyon 2 days ago

        That's an awesome paper!

    • phreeza 2 days ago

      By orthogonal-enough dimensions, do you mean vectors whose dot product is close to zero?

    • sigmoid10 2 days ago

      This is actually just another way to see the third example (concentration of measure). As you increase the number of dimensions, the contribution of each base vector component in the calculation of, say, the cosine angle (i.e. via the scalar product) becomes less important. So in three dimensions you'll have a pretty high angle if one vector component points along a different base vector. But in 10,000 dimensions, the angle will be tiny.

    • westurner 3 days ago

      Does distance in feature space require orthogonality?

      With real space (x,y,z) we omit the redundant units from each feature when describing the distance in feature space.

      But distance is just a metric, and often the space or paths through it are curvilinear.

      By Taxicab distance, it's 3 cats, 4 dogs, and 5 glasses of water away.

      Python now has math.dist() for Euclidean distance, for example.

      • epistasis 2 days ago

        Near-orthogonality allows fitting in more directions for distinct concepts than the dimension of the space. So even though the dimension of an LLM might be <2000, far far more than 2000 distinct directions can fit into that space.

        The term most often used is "superposition." Here's some material on it that I'm working through right now:

        https://arena3-chapter1-transformer-interp.streamlit.app/%5B...

  • crazygringo 2 days ago

    > The volume of the unit d-sphere goes to 0 as d grows! A high dimensional unit sphere encloses almost no volume!

    This feels misleading to me.

    Directly comparing volumes in different dimensions doesn't make any sense because the units are different. It doesn't make sense to say that a quantity in m^3 is larger or smaller than a quantity in m^4. Because it doesn't make any sense to compare the area of a circle with the volume of a sphere.

    > More accurate pictorial representations of high dimensional cubes (left) and spheres (right).

    The cube one is arguably accurate -- e.g. in 100 dimensions, if the distance from the center of a cube to the center of a face is 1, then the distance from the center of the cube to a corner is 10.

    But the sphere one, I don't know. Every point on a 100-dimensional sphere is still the same distance away from its center. The sphere is staying spherical in an intuitive way, it's just that the corners of the enclosing cube have gotten so much further away.

    So what is accurate to say is that the proportion of volume of a sphere relative to that of its bounding cube keeps decreasing. Which, rather than being supposedly "counterintuitive", makes perfect intuitive sense -- because every time you add a dimension, you can think of it as "extruding" the previous sphere into the new dimension and then shaving it round, the way a 2D circle can be extruded into a cylinder in 3D and then shaved down to make it into a sphere. Every time you add a dimension, you shave off more.

    The article suggests that a 3D sphere has greater volume than a 2D circle -- with a unit radius, the sphere is 4/3π while the circle is just π. But again, they're in different units, so it's a meaningless statement. It makes much more sense to say that a 2D circle takes up (1/4)π≈0.79 of its bounding square, a 3D sphere takes of (1/6)π≈0.52 of its bounding cube, a 4D sphere takes up (π/32)π≈=0.31, and so forth. So no, the volume doesn't go up and then down -- it just goes down every time when taken as a unitless proportion (and proportions are comparable).

    • jvanderbot 2 days ago

      Your image of extruding a cylinder in higher dimensions inside its bounding box then rounding it off was really insightful as a teaching tool. I've always struggled to visualize these "counterintuitive" results, which are only counterintuitive because they are harder to visualize or seem to "change" after D=3. But now they don't. Thanks!

    • jsenn 2 days ago

      I don't think this is right. If you're worried about units you can calculate the (generalized) surface area to volume ratio, which turns out to be exactly D/r. In other words, as D increases, the ratio goes to infinity.

      I think this fact can fairly be interpreted to mean that a high-dimensional unit sphere encloses almost no volume. The 2D cartoon drawing of a hypersphere also helps capture this: you can imagine the "spikes" stretching out and squeezing the interior portion, until it's all outside and no inside.

      EDIT: another argument I've seen involves calculating the ratio of the volume of a thin shell surrounding the n-sphere's surface to its total volume. You can prove that the limit of the ratio as the dimension goes to infinity is 1. In other words, in high dimensions almost all of the volume of the sphere is concentrated near its surface.

      • aatd86 2 days ago

        Another simplistic way to see it is that it is a ratio of contained information. In higher dimensional spaces, the space is so big that below the unit spheres contain exponentially less information.

        It's just something between 0 and 1 exponentized to d where d is the dimension after all (i.e. the number of eigenvectors).

        d is an exponential scale factor in a sense.

    • yatopifo 2 days ago

      I think it all depends on how you define hypervolume. If you say it’s a positive real number constructed by means of integration, then you can certainly compare them across objects of various dimensions. When you say “units” I immediately think of stuff like bivectors and trivectors where you can’t reduce one to another without losing important geometric properties. But here we are talking about just the scalar part which is as “unitless” as can be.

      • shwouchk 17 hours ago

        Integration when extrapolated to many dimensions has many nuances, and be careful that you don't have a circular definition of hypervolume in terms of integration.

        For a simple example of difficulties consider comparing the volume of two distinct k-unit spheres embedded in R^n where n>k.

  • rectang 3 days ago

    Time to share my favorite quote from Symbols, Signals and Noise by John R. Pierce, where he discusses how Shannon achieved a breakthrough in Information Theory:

    > This chapter has had another aspect. In it we have illustrated the use of a novel viewpoint and the application of a powerful field of mathematics in attacking a problem of communication theory. Equation 9.3 was arrived at by the by-no-means-obvious expedient of representing long electrical signals and the noises added to them by points in a multidimensional space. The square of the distance of a point from the origin was interpreted as the energy of the signal represented by a point.

    > Thus a problem in communication theory was made to correspond to a problem in geometry, and the desired result was arrived at by geometrical arguments.

    • CoastalCoder 2 days ago

      Anyone know if Pierce's book (dated 1961) is still a good intro to the topic?

      My background is in CS, and this would just be evening reading out of general interest.

      • hotspot_one 2 days ago

        I would be willing to read a few chapters just on spec. There is real value in understanding how people used to think about a problem, and where the source ideas came from.

  • remcob 2 days ago

    The distance between two uniform random points on an n-sphere clusters around the equator. The article shows a histogram of the distribution in fig. 11. While it looks Gaussian, it is more closely related to the Beta distribution. I derived it in my notes, as (surprisingly) I could not find it easily in literature:

    https://xn--2-umb.com/21/n-sphere

    • zombot 2 days ago

      > The distance between two uniform random points on an n-sphere clusters around the equator.

      This sentence makes no sense to me.

      • p1esk 2 days ago

        He means it clusters around the distance from a pole to the equator.

        • remcob 2 days ago

          Correct. I was too short in my comment. It's explained in the article: without loss of generality you can call one of the two points the 'north pole' and then the other one will be distributed close to the equator.

      • isoprophlex 2 days ago

        Pick an equator on an n-sphere. It is a hyperplane of dimensions (n-1) through the center, composed of all but one dimensions of your sphere. The xy plane for a unit sphere in xyz, for example.

        Uniformly distribute points on the sphere. For high n, all points will be very near the equator you chose.

        Obviously, in ofder for a point to be not close to this chosen equator, it projects close to 0 on all dimensions spanning the equatorial hyperplane, and not close to 0 on the dimension making up the pole-to-pole axis.

        • oersted 2 days ago

          My first thought is that it's rather obvious, but I'm probably wrong, can you help me understand?

          The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

          I guess that the consequence is still interesting, that most random points in a high-dimensional n-sphere will be close to the equator. But they will be close to all arbitrary chosen equators, so it's not that meaningful.

          If the equator is defined as containing n-1 dimensions, then as n goes higher you'd expect it to "take up" more of the space of the sphere, hence most random points will be close to it. It is a surprising property of high-dimensional space, but I think it's mainly because we don't usually think about the general definition of an equator and how it scales to higher dimensions, once you understand that it's not very surprising.

          • isoprophlex 2 days ago

            > The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

            You're exactly right, this whole thing is indeed a bit of an obvious nothingburger.

      • akdor1154 2 days ago

        "clusters" is acting as a verb here, not a noun.

    • 7fYZ7mJh3RNKNaG 2 days ago

      beautiful visualizations, how did you make them?

      • remcob 2 days ago

        The first one IIRC with Geogebra, all the rest with Matplotlib. The design goal was to maximize on 'data-ink ratio'.

  • FabHK 3 days ago

    For high-dimensional spheres, most of the volume is in the "shell", ie near the boundary [0]. This sort of makes sense to me, but I don't know how to square that with the observation in the article that most of the surface area is near the equator. (In particular, by symmetry, it's near any equator; so, one would think, in their intersection. That is near the centre, though, not the shell.)

    Anyway. Never buy a high-dimensional orange, it's mostly rind.

    [0] https://www.math.wustl.edu/~feres/highdim

    • hansvm 3 days ago

      It's basically the same idea in both cases. Power laws warp anything "slightly bigger" into dominating everything else when the power is big enough. There's a bit more stuff near the outside than the inside, so with a high enough dimension the volume is in the rind. Similarly, the equator is a bit bigger than the other slices, so with enough dimensions its surface area dominates.

      • WiSaGaN 2 days ago

        Yes, this seems to be the result of the standard Euclidean metric rather than the high dimension itself. I guess most people assuming the metric to be Euclidean, so it's ok.

    • youoy 2 days ago

      If you like ML, this is also related with the results of this paper [0], where they show that learning in high dimensions amounts to extrapolation, as opposed to interpolation. Intuitively I think of this as the fact that points in the sphere are convexly independent, and most of the volume of the ball is near the boundary.

      [0] https://arxiv.org/abs/2110.09485

  • brazzy 2 days ago

    > The volume of the unit -sphere goes to 0 as grows! A high dimensional unit sphere encloses almost no volume! The volume increases from dimensions one to five, but begins decreasing rapidly toward 0 after dimension six.

    What the absolute fuck?

    That one caught me truly off guard. I don't think "counterintuitive" is a strong enough word.

  • bmitc 3 days ago

    Actually, the most counterintuitive is 4-dimensional space. It is rather mathematically unique, often exhibiting properties no other dimension does.

    • dullcrisp 3 days ago

      Well I’m sure 2- and 3- dimensional space are also mathematically unique and interesting by the same token, but they’re nearer to our experience and intuition.

      • ngruhn 2 days ago

        I‘ve heard that knots only exist in 3 dimensions. In 2D you can’t entangle anything and in 4D+ you can always untangle everything.

        • madcaptenor 2 days ago

          It's been a while since I studied any topology, but if I'm remembering correctly you can knot an (n-2)-dimensional surface in n-dimensions.

    • NL807 3 days ago

      >often exhibiting properties no other dimension does.

      Isn't that true for some other dimensions as well? There is a whole much of mathematical concepts that is constrained for a specific dimension. For example the cross product only makes sense in 3D. The perpendicular dot product (a special case of the determinant) only makes sense in 2D.

      • immibis 2 days ago

        Apparently there's also a 7D cross product - and no others!

      • justsid 2 days ago

        The cross product is a generalization of the wedge product which does exist in higher dimensions.

    • elcritch 3 days ago

      How so?

      • hansvm 3 days ago

        The intuitive way to think about it is that with very few dimensions you have very few degrees of freedom, so it's easy to prove things possible or impossible. With lots of dimensions, you have enough wiggle room to prove most things possible. Somewhere in between, you have enough complexity to not trivialize the problems but not enough wiggle room to be able to easily circumvent the issue.

        Often in practice, that boundary is around 3-4 dimensions. See the poincaré conjecture, various sphere packing shenanigans, graph embeddings, ....

      • bmitc 3 days ago

        There's a section here about phenomena in 4 dimensions: https://en.wikipedia.org/wiki/4-manifold

        One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

        • elcritch 3 days ago

          Fascinating that higher dimension manifolds are more restrictive!

          Though in a _very_ handwavy way it seems intuitive given properties like that in TFA where 4-d is the only dimension where the edges of the bounding cube and inner spheres match. Especially given that that property seems related to the possible neighborhoods of points in d-4 manifolds. Though I quickly get lost in the specifics of the maths on manifolds. :)

          > However in four dimensions something very interesting happens. The radius of the inner sphere is exactly 1/2, which is just large enough for the inner sphere to touch the sides of the cube!

        • ashishb 3 days ago

          > One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

          Can you give some intuition on smooth structure and manifold? I read Wikipedia articles a few times but still can't grasp them.

          • bmitc 3 days ago

            Applying a smooth structure to a manifold to make it a smooth manifold is like a patching process that makes it look like a Eucliden space.

            Most of calculus and undergraduate math, engineering, and physics takes place in Euclidean space R^n. So all the curves and surfaces directly embed into R^n, usually where n = 2 or n = 3. However, there are more abstract spaces that one would like to study and those are manifolds. To do calculus on them, they need to be smooth manifolds. A smooth structure is a collection of "patches" (normally called charts) such that each patch (chart) is homeomorphic (topologically equivalent) to an open set in R^n. Such a manifold is called an n-dimensional manifold. The smoothness criterion is a technicality such that the coordinates and transformation coordinates are smooth, i.e., infinitely differentiable. Smooth manifolds is basically the extension of calculus to more general and abstract dimensions.

            For example, a circle is a 1-dimensional manifold since it locally looks like a line segment. A sphere (the shell of the sphere) is a 2-dimensional manifold because it locally looks like an open subset of R^2, i.e., it locally looks like a two dimensional plane. Take Earth for example. Locally, a Euclidean x-y coordinate system works well.

          • aithrowawaycomm 2 days ago

            I am not sure the other comment was especially intuitive. Here is my understanding:

            Euclidean space is a vector space and therefore pretty easy to work with in computations (especially calculus) compared to something like the surface of a sphere, but the sphere doesn't simply abandon Euclidean vector structure. We can take halves of the sphere and "flatten them out," so instead of working with the sphere we can work with two planes, keeping in mind that the flattening functions define the boundary of those planes we're allowed to work within. Then we can do computations on the plane and "unflatten" them to get the result of those computations on the sphere.

            Manifolds are a generalization of this idea: you have a complicated topological structure S, but also some open subsets of S, S_i, which partition S, and smooth, invertible functions f_i: S_i -> R^n that tell you how to treat elements of S locally as if they were vectors in Euclidean space (and since the functions are invertible, it tells you how to map the vectors back to S, which is what you want).

            The manifold is a pair, the space S and the smooth functions f_i. The smoothness is important because ultimately we are interested in doing calculus on S, so if the mapping functions have "sharp edges" then we're introducing sharp edges into S that are entirely a result of the mapping and not S's own geometry.

  • mattxxx 3 days ago

    Yea - high dimensional spaces are weird and hard to reason about... and we're working very frequently in them, especially when dealing with ML.

    • l33t7332273 3 days ago

      Luckily if you do enough math it becomes much easier to reason about such spaces

      • JBiserkov 3 days ago

        - How do you even visualize an 11-dimensional space?

        - oh that's easy - you just visualize an N-dimensional space and then set N equal to 11.

        • rectang 2 days ago

          I think of high-dimensional spaces in terms of projection. Projecting a 3-dimensional space onto a 2-dimensional space loses information and the results depend on perspective. Same with an 11-dimensional space being projected onto a 10-dimensional space.

          I find that this metaphor works pretty well for visualizing how a vector-space search engine represents how two documents can be "similar" in N-dimensional term-space: look at them from the right angle and they appear close together.

        • marcosdumay 2 days ago

          Yeah, stopping that need to visualize everything is one of the mechanisms usually adopted for working in high-dimensional space.

  • nyc111 2 days ago

    I don't understand what is meant here by "dimension." Is the definition of "dimension" consistent for 3-dimensional figures and n-dimensional figures? In other words, what is the importance of the orthogonality of the axes? The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

    I cannot conceive a geometrical image of higher dimensions. Algebraically, yes, but not geometrically.

    • travisjungroth 2 days ago

      > The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

      They are orthogonal.

      > I cannot conceive a geometrical image of higher dimensions.

      This is normal, and essential to the point of the article. If you could visualize 10-dimensional space, it wouldn’t be so counterintuitive.

      Try looking up images and videos of 4D objects projected into 3D and 2D. That might help. Hypercubes are maybe the easiest.

      • nyc111 2 days ago

        From Wikipedia: https://en.m.wikipedia.org/wiki/Hypercube "4 – If one moves the cube one unit length into the fourth dimension, it generates a 4-dimensional unit hypercube (a unit tesseract)."

        How do we draw an orthogonal line to the three orthogonal linas that we have?

        • travisjungroth a day ago

          In 3 dimensional space, you can’t.

          Draw a square on paper. The lines will be orthogonal. Draw a cube. The third dimension won’t be orthogonal. It can’t be. But, that doesn’t mean a third dimension doesn’t exist or can’t exist.

          The same thing happens with a hypercube. The fourth dimension won’t be orthogonal on paper. It won’t be orthogonal in three dimensions (you can build a 3D projection of a hypercube). This doesn’t mean it isn’t real or can’t be real.

          Whether you think of 4D space as something real we don’t have access to or something imaginary isn’t really too important. It’s just very helpful to realize it won’t have the concreteness of 3D objects in 3D space for you because you don’t have direct access to it.

          This video might help. https://youtu.be/UnURElCzGc0?si=MQa2JKT_CMmM-_JL

  • derbOac 3 days ago

    I love this stuff because it's so counterintuitive until you've worked through some of it. There was an article linked to on HN a while back about high-dimensional Gaussian distributions that was similar in message, and probably mathematically related at some level. It has so many implications for much of the work in deep learning and large data, among other things.

    • vqv 2 days ago

      Statistician here.

      I agree that some of this stuff seems counterintuitive on the surface. Once you make the connection with high-dimensional Gaussians, it can become more "obvious": if Z is standard n-dimensional Gaussian random vector, i.e. one with iid N(0,1) coordinates, then normalizing Z by its norm, say W, gives a random vector U that is uniformly distributed on an n-Sphere. Moreover, U is independent of W --- this is related to the fact that the sample mean and variance are independent for a random sample from a Normal population --- and W^2 has Chi-squared distribution on n degrees of freedom. So for example a statement about concentration of volume of the n-Sphere about an equatorial slice is equivalent to a statement about the probability that the dot product between U and a fixed unit norm vector is close to 0, and that probability is easy to approximate using undergraduate-level probability theory.

      Circling back to data: it is very easy to be mislead when working with high-dimensional data, i.e. data with many, many features.

  • ljouhet 2 days ago

    Thank you! I love this article, but I couldn't find it.

    I always search "Curse of dimensionality" instead of "Counterintuitive properties..."