Counterintuitive Properties of High Dimensional Space (2018)

(people.eecs.berkeley.edu)

248 points | by nabla9 9 months ago ago

76 comments

gcanyon 9 months ago

One that isn't listed here, and which is critical to machine learning, is the idea of near-orthogonality. When you think of 2D or 3D space, you can only have 2 or 3 orthogonal directions, and allowing for near-orthogonality doesn't really gain you anything. But in higher dimensions, you can reasonably work with directions that are only somewhat orthogonal, and "somewhat" gets pretty silly large once you get to thousands of dimensions -- like 75 degrees is fine (I'm writing this from memory, don't quote me). And the number of orthogonal-enough dimensions you can have scales as maybe as much as 10^sqrt(dimension_count), meaning that yes, if your embeddings have 10,000 dimensions, you might be able to have literally 10^100 different orthogonal-enough dimensions. This is critical for turning embeddings + machine learning into LLMs.

[-]

user070223 9 months ago

That's what illustrated in the paper Toy Models of superposition

https://arxiv.org/pdf/2209.10652

[-]

gcanyon 9 months ago

That's an awesome paper!

phreeza 9 months ago

By orthogonal-enough dimensions, do you mean vectors whose dot product is close to zero?

[-]

gcanyon 9 months ago

yes

sigmoid10 9 months ago

This is actually just another way to see the third example (concentration of measure). As you increase the number of dimensions, the contribution of each base vector component in the calculation of, say, the cosine angle (i.e. via the scalar product) becomes less important. So in three dimensions you'll have a pretty high angle if one vector component points along a different base vector. But in 10,000 dimensions, the angle will be tiny.

afiori 9 months ago

Thinking about it I would guess that the kissing number covered in the article is quite similar to the concept of near orthogonality.

Indeed I believe that you can redefine the n-th kissing number as: the maximum number of points on a (n-1)-sphere such that for no pair of points the angle between the center and themselves is less than 60°

westurner 9 months ago

Does distance in feature space require orthogonality?

With real space (x,y,z) we omit the redundant units from each feature when describing the distance in feature space.

But distance is just a metric, and often the space or paths through it are curvilinear.

By Taxicab distance, it's 3 cats, 4 dogs, and 5 glasses of water away.

Python now has math.dist() for Euclidean distance, for example.

[-]

epistasis 9 months ago

Near-orthogonality allows fitting in more directions for distinct concepts than the dimension of the space. So even though the dimension of an LLM might be <2000, far far more than 2000 distinct directions can fit into that space.

The term most often used is "superposition." Here's some material on it that I'm working through right now:

https://arena3-chapter1-transformer-interp.streamlit.app/%5B...

[-]

westurner 9 months ago

Skew coordinates aren't orthogonal.

Skew coordinates: https://en.wikipedia.org/wiki/Skew_coordinates

Are the feature described with high-dimensional spaces really all 90° geometrically orthogonal?

How does the distance metric vary with feature order?

Do algorithmic outputs diverge or converge given variance in sequence order of all orthogonal axes? Does it matter which order the dimensions are stated in; is the output sensitive to feature order, but does it converge regardless?

Re: superposition in this context, too

Are there multiple particles in the same space, or is it measuring a point-in-time sampling of the possible states of one particle?

(Can photons actually occupy the same point in spacetime? Can electrons? But the plenoptic function describes all light passing through a point or all of the space)

Expectation values are or are not good estimators of wave function outputs from discrete quantum circuits and real quantum systems.

To describe the products of the histogram PDFs

[-]

westurner 9 months ago

> Are the [features] described with high-dimensional spaces really all 90° geometrically orthogonal?

If the features are not statistically independent, I don't think it's likely that they're truly orthogonal; which might not affect the utility of a distance metric that assumes that they are all orthogonal.

gcanyon 9 months ago

Nice, thanks!

crazygringo 9 months ago

> The volume of the unit d-sphere goes to 0 as d grows! A high dimensional unit sphere encloses almost no volume!

This feels misleading to me.

Directly comparing volumes in different dimensions doesn't make any sense because the units are different. It doesn't make sense to say that a quantity in m^3 is larger or smaller than a quantity in m^4. Because it doesn't make any sense to compare the area of a circle with the volume of a sphere.

> More accurate pictorial representations of high dimensional cubes (left) and spheres (right).

The cube one is arguably accurate -- e.g. in 100 dimensions, if the distance from the center of a cube to the center of a face is 1, then the distance from the center of the cube to a corner is 10.

But the sphere one, I don't know. Every point on a 100-dimensional sphere is still the same distance away from its center. The sphere is staying spherical in an intuitive way, it's just that the corners of the enclosing cube have gotten so much further away.

So what is accurate to say is that the proportion of volume of a sphere relative to that of its bounding cube keeps decreasing. Which, rather than being supposedly "counterintuitive", makes perfect intuitive sense -- because every time you add a dimension, you can think of it as "extruding" the previous sphere into the new dimension and then shaving it round, the way a 2D circle can be extruded into a cylinder in 3D and then shaved down to make it into a sphere. Every time you add a dimension, you shave off more.

The article suggests that a 3D sphere has greater volume than a 2D circle -- with a unit radius, the sphere is 4/3π while the circle is just π. But again, they're in different units, so it's a meaningless statement. It makes much more sense to say that a 2D circle takes up (1/4)π≈0.79 of its bounding square, a 3D sphere takes of (1/6)π≈0.52 of its bounding cube, a 4D sphere takes up (π/32)π≈=0.31, and so forth. So no, the volume doesn't go up and then down -- it just goes down every time when taken as a unitless proportion (and proportions are comparable).

[-]

jvanderbot 9 months ago

Your image of extruding a cylinder in higher dimensions inside its bounding box then rounding it off was really insightful as a teaching tool. I've always struggled to visualize these "counterintuitive" results, which are only counterintuitive because they are harder to visualize or seem to "change" after D=3. But now they don't. Thanks!

[-]

ricksunny 9 months ago

Hypercubist Math has the ambitious goal of imbuing an intuitive sense of 4-dimensions to people. Currently at Vol. 2 of a several-volume series, will be interssting whether they succeed or not if user feedback is anything to go by.

https://m.youtube.com/watch?v=SwGbHsBAcZ0&t=509s&pp=ygUQaHlw...

jsenn 9 months ago

I don't think this is right. If you're worried about units you can calculate the (generalized) surface area to volume ratio, which turns out to be exactly D/r. In other words, as D increases, the ratio goes to infinity.

I think this fact can fairly be interpreted to mean that a high-dimensional unit sphere encloses almost no volume. The 2D cartoon drawing of a hypersphere also helps capture this: you can imagine the "spikes" stretching out and squeezing the interior portion, until it's all outside and no inside.

EDIT: another argument I've seen involves calculating the ratio of the volume of a thin shell surrounding the n-sphere's surface to its total volume. You can prove that the limit of the ratio as the dimension goes to infinity is 1. In other words, in high dimensions almost all of the volume of the sphere is concentrated near its surface.

[-]

aatd86 9 months ago

Another simplistic way to see it is that it is a ratio of contained information. In higher dimensional spaces, the space is so big that below the unit spheres contain exponentially less information.

It's just something between 0 and 1 exponentized to d where d is the dimension after all (i.e. the number of eigenvectors).

d is an exponential scale factor in a sense.

yatopifo 9 months ago

I think it all depends on how you define hypervolume. If you say it’s a positive real number constructed by means of integration, then you can certainly compare them across objects of various dimensions. When you say “units” I immediately think of stuff like bivectors and trivectors where you can’t reduce one to another without losing important geometric properties. But here we are talking about just the scalar part which is as “unitless” as can be.

[-]

shwouchk 9 months ago

Integration when extrapolated to many dimensions has many nuances, and be careful that you don't have a circular definition of hypervolume in terms of integration.

For a simple example of difficulties consider comparing the volume of two distinct k-unit spheres embedded in R^n where n>k.

9 months ago

[deleted]

9 months ago

[deleted]

rectang 9 months ago

Time to share my favorite quote from Symbols, Signals and Noise by John R. Pierce, where he discusses how Shannon achieved a breakthrough in Information Theory:

> This chapter has had another aspect. In it we have illustrated the use of a novel viewpoint and the application of a powerful field of mathematics in attacking a problem of communication theory. Equation 9.3 was arrived at by the by-no-means-obvious expedient of representing long electrical signals and the noises added to them by points in a multidimensional space. The square of the distance of a point from the origin was interpreted as the energy of the signal represented by a point.

> Thus a problem in communication theory was made to correspond to a problem in geometry, and the desired result was arrived at by geometrical arguments.

[-]

CoastalCoder 9 months ago

Anyone know if Pierce's book (dated 1961) is still a good intro to the topic?

My background is in CS, and this would just be evening reading out of general interest.

[-]

hotspot_one 9 months ago

I would be willing to read a few chapters just on spec. There is real value in understanding how people used to think about a problem, and where the source ideas came from.

remcob 9 months ago

The distance between two uniform random points on an n-sphere clusters around the equator. The article shows a histogram of the distribution in fig. 11. While it looks Gaussian, it is more closely related to the Beta distribution. I derived it in my notes, as (surprisingly) I could not find it easily in literature:

https://xn--2-umb.com/21/n-sphere

[-]

zombot 9 months ago

> The distance between two uniform random points on an n-sphere clusters around the equator.

This sentence makes no sense to me.

[-]

p1esk 9 months ago

He means it clusters around the distance from a pole to the equator.

[-]

remcob 9 months ago

Correct. I was too short in my comment. It's explained in the article: without loss of generality you can call one of the two points the 'north pole' and then the other one will be distributed close to the equator.

isoprophlex 9 months ago

Pick an equator on an n-sphere. It is a hyperplane of dimensions (n-1) through the center, composed of all but one dimensions of your sphere. The xy plane for a unit sphere in xyz, for example.

Uniformly distribute points on the sphere. For high n, all points will be very near the equator you chose.

Obviously, in ofder for a point to be not close to this chosen equator, it projects close to 0 on all dimensions spanning the equatorial hyperplane, and not close to 0 on the dimension making up the pole-to-pole axis.

[-]

oersted 9 months ago

My first thought is that it's rather obvious, but I'm probably wrong, can you help me understand?

The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

I guess that the consequence is still interesting, that most random points in a high-dimensional n-sphere will be close to the equator. But they will be close to all arbitrary chosen equators, so it's not that meaningful.

If the equator is defined as containing n-1 dimensions, then as n goes higher you'd expect it to "take up" more of the space of the sphere, hence most random points will be close to it. It is a surprising property of high-dimensional space, but I think it's mainly because we don't usually think about the general definition of an equator and how it scales to higher dimensions, once you understand that it's not very surprising.

[-]

isoprophlex 9 months ago

> The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

You're exactly right, this whole thing is indeed a bit of an obvious nothingburger.

akdor1154 9 months ago

"clusters" is acting as a verb here, not a noun.

9 months ago

[deleted]

7fYZ7mJh3RNKNaG 9 months ago

beautiful visualizations, how did you make them?

[-]

remcob 9 months ago

The first one IIRC with Geogebra, all the rest with Matplotlib. The design goal was to maximize on 'data-ink ratio'.

FabHK 9 months ago

For high-dimensional spheres, most of the volume is in the "shell", ie near the boundary [0]. This sort of makes sense to me, but I don't know how to square that with the observation in the article that most of the surface area is near the equator. (In particular, by symmetry, it's near any equator; so, one would think, in their intersection. That is near the centre, though, not the shell.)

Anyway. Never buy a high-dimensional orange, it's mostly rind.

[0] https://www.math.wustl.edu/~feres/highdim

[-]

hansvm 9 months ago

It's basically the same idea in both cases. Power laws warp anything "slightly bigger" into dominating everything else when the power is big enough. There's a bit more stuff near the outside than the inside, so with a high enough dimension the volume is in the rind. Similarly, the equator is a bit bigger than the other slices, so with enough dimensions its surface area dominates.

[-]

WiSaGaN 9 months ago

Yes, this seems to be the result of the standard Euclidean metric rather than the high dimension itself. I guess most people assuming the metric to be Euclidean, so it's ok.

[-]

hansvm 9 months ago

The conditions you need for that to be true are substantially weaker than being Euclidean (though, when people are talking about "weird" behavior in high-dimensional spaces nowadays, it's in the context of ML and Euclidean stuff anyway). If you have a meaningful notion of dimension (basic properties like <1,0> being different from <0,1> and <1,0> being closer to <1,1> than <0,1>) and and don't have discretized shenanigans (which would collapse the inequality I'm exploiting to a strict equality, with some sort of 1^n=1 behavior) then the natural measure induced by the metric in question will exhibit the described behavior. You can easily verify that for every metric represented in scipy or whatnot, and a proper proof isn't too much more work.

youoy 9 months ago

If you like ML, this is also related with the results of this paper [0], where they show that learning in high dimensions amounts to extrapolation, as opposed to interpolation. Intuitively I think of this as the fact that points in the sphere are convexly independent, and most of the volume of the ball is near the boundary.

[0] https://arxiv.org/abs/2110.09485

9 months ago

[deleted]

brazzy 9 months ago

> The volume of the unit -sphere goes to 0 as grows! A high dimensional unit sphere encloses almost no volume! The volume increases from dimensions one to five, but begins decreasing rapidly toward 0 after dimension six.

What the absolute fuck?

That one caught me truly off guard. I don't think "counterintuitive" is a strong enough word.

[-]

Anotheroneagain 9 months ago

I don't think this is actually true - the actual volume of the sphere grows, but the volume of one grows even faster.

bmitc 9 months ago

Actually, the most counterintuitive is 4-dimensional space. It is rather mathematically unique, often exhibiting properties no other dimension does.

[-]

dullcrisp 9 months ago

Well I’m sure 2- and 3- dimensional space are also mathematically unique and interesting by the same token, but they’re nearer to our experience and intuition.

[-]

ngruhn 9 months ago

I‘ve heard that knots only exist in 3 dimensions. In 2D you can’t entangle anything and in 4D+ you can always untangle everything.

[-]

madcaptenor 9 months ago

It's been a while since I studied any topology, but if I'm remembering correctly you can knot an (n-2)-dimensional surface in n-dimensions.

NL807 9 months ago

>often exhibiting properties no other dimension does.

Isn't that true for some other dimensions as well? There is a whole much of mathematical concepts that is constrained for a specific dimension. For example the cross product only makes sense in 3D. The perpendicular dot product (a special case of the determinant) only makes sense in 2D.

[-]

immibis 9 months ago

Apparently there's also a 7D cross product - and no others!

justsid 9 months ago

The cross product is a generalization of the wedge product which does exist in higher dimensions.

elcritch 9 months ago

How so?

[-]

hansvm 9 months ago

The intuitive way to think about it is that with very few dimensions you have very few degrees of freedom, so it's easy to prove things possible or impossible. With lots of dimensions, you have enough wiggle room to prove most things possible. Somewhere in between, you have enough complexity to not trivialize the problems but not enough wiggle room to be able to easily circumvent the issue.

Often in practice, that boundary is around 3-4 dimensions. See the poincaré conjecture, various sphere packing shenanigans, graph embeddings, ....

bmitc 9 months ago

There's a section here about phenomena in 4 dimensions: https://en.wikipedia.org/wiki/4-manifold

One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

[-]

elcritch 9 months ago

Fascinating that higher dimension manifolds are more restrictive!

Though in a _very_ handwavy way it seems intuitive given properties like that in TFA where 4-d is the only dimension where the edges of the bounding cube and inner spheres match. Especially given that that property seems related to the possible neighborhoods of points in d-4 manifolds. Though I quickly get lost in the specifics of the maths on manifolds. :)

> However in four dimensions something very interesting happens. The radius of the inner sphere is exactly 1/2, which is just large enough for the inner sphere to touch the sides of the cube!

ashishb 9 months ago

> One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

Can you give some intuition on smooth structure and manifold? I read Wikipedia articles a few times but still can't grasp them.

[-]

bmitc 9 months ago

Applying a smooth structure to a manifold to make it a smooth manifold is like a patching process that makes it look like a Eucliden space.

Most of calculus and undergraduate math, engineering, and physics takes place in Euclidean space R^n. So all the curves and surfaces directly embed into R^n, usually where n = 2 or n = 3. However, there are more abstract spaces that one would like to study and those are manifolds. To do calculus on them, they need to be smooth manifolds. A smooth structure is a collection of "patches" (normally called charts) such that each patch (chart) is homeomorphic (topologically equivalent) to an open set in R^n. Such a manifold is called an n-dimensional manifold. The smoothness criterion is a technicality such that the coordinates and transformation coordinates are smooth, i.e., infinitely differentiable. Smooth manifolds is basically the extension of calculus to more general and abstract dimensions.

For example, a circle is a 1-dimensional manifold since it locally looks like a line segment. A sphere (the shell of the sphere) is a 2-dimensional manifold because it locally looks like an open subset of R^2, i.e., it locally looks like a two dimensional plane. Take Earth for example. Locally, a Euclidean x-y coordinate system works well.

aithrowawaycomm 9 months ago

I am not sure the other comment was especially intuitive. Here is my understanding:

Euclidean space is a vector space and therefore pretty easy to work with in computations (especially calculus) compared to something like the surface of a sphere, but the sphere doesn't simply abandon Euclidean vector structure. We can take halves of the sphere and "flatten them out," so instead of working with the sphere we can work with two planes, keeping in mind that the flattening functions define the boundary of those planes we're allowed to work within. Then we can do computations on the plane and "unflatten" them to get the result of those computations on the sphere.

Manifolds are a generalization of this idea: you have a complicated topological structure S, but also some open subsets of S, S_i, which partition S, and smooth, invertible functions f_i: S_i -> R^n that tell you how to treat elements of S locally as if they were vectors in Euclidean space (and since the functions are invertible, it tells you how to map the vectors back to S, which is what you want).

The manifold is a pair, the space S and the smooth functions f_i. The smoothness is important because ultimately we are interested in doing calculus on S, so if the mapping functions have "sharp edges" then we're introducing sharp edges into S that are entirely a result of the mapping and not S's own geometry.

mattxxx 9 months ago

Yea - high dimensional spaces are weird and hard to reason about... and we're working very frequently in them, especially when dealing with ML.

[-]

l33t7332273 9 months ago

Luckily if you do enough math it becomes much easier to reason about such spaces

[-]

JBiserkov 9 months ago

- How do you even visualize an 11-dimensional space?

- oh that's easy - you just visualize an N-dimensional space and then set N equal to 11.

[-]

rectang 9 months ago

I think of high-dimensional spaces in terms of projection. Projecting a 3-dimensional space onto a 2-dimensional space loses information and the results depend on perspective. Same with an 11-dimensional space being projected onto a 10-dimensional space.

I find that this metaphor works pretty well for visualizing how a vector-space search engine represents how two documents can be "similar" in N-dimensional term-space: look at them from the right angle and they appear close together.

marcosdumay 9 months ago

Yeah, stopping that need to visualize everything is one of the mechanisms usually adopted for working in high-dimensional space.

nyc111 9 months ago

I don't understand what is meant here by "dimension." Is the definition of "dimension" consistent for 3-dimensional figures and n-dimensional figures? In other words, what is the importance of the orthogonality of the axes? The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

I cannot conceive a geometrical image of higher dimensions. Algebraically, yes, but not geometrically.

[-]

travisjungroth 9 months ago

> The axes of dimensions beyond 3-d are not orthogonal, does this change the definition of "dimension".

They are orthogonal.

> I cannot conceive a geometrical image of higher dimensions.

This is normal, and essential to the point of the article. If you could visualize 10-dimensional space, it wouldn’t be so counterintuitive.

Try looking up images and videos of 4D objects projected into 3D and 2D. That might help. Hypercubes are maybe the easiest.

[-]

nyc111 9 months ago

From Wikipedia: https://en.m.wikipedia.org/wiki/Hypercube "4 – If one moves the cube one unit length into the fourth dimension, it generates a 4-dimensional unit hypercube (a unit tesseract)."

How do we draw an orthogonal line to the three orthogonal linas that we have?

[-]

travisjungroth 9 months ago

In 3 dimensional space, you can’t.

Draw a square on paper. The lines will be orthogonal. Draw a cube. The third dimension won’t be orthogonal. It can’t be. But, that doesn’t mean a third dimension doesn’t exist or can’t exist.

The same thing happens with a hypercube. The fourth dimension won’t be orthogonal on paper. It won’t be orthogonal in three dimensions (you can build a 3D projection of a hypercube). This doesn’t mean it isn’t real or can’t be real.

Whether you think of 4D space as something real we don’t have access to or something imaginary isn’t really too important. It’s just very helpful to realize it won’t have the concreteness of 3D objects in 3D space for you because you don’t have direct access to it.

This video might help. https://youtu.be/UnURElCzGc0?si=MQa2JKT_CMmM-_JL

[-]

nyc111 9 months ago

"This video might help. https://youtu.be/UnURElCzGc0?si=MQa2JKT_CMmM-_JL"

Sagan's video proves that it's always helpful to investigate abstract concepts with an experiment.

But the video also raises an important question: Can we derive true conclusions from wrong assumptions?

Here the assumption that the "flatlanders" are really flat is wrong.

Sagan notices that too and he says that his objects are not really flat, but they have thickness.

  Now these little cutouts have some little height 
  but let’s ignore that, let’s imagine that these 
  are absolutely flat.

So the thickness of his objects are never zero. His flatlanders has a notion of third dimension because they have the third dimension. Their height is not zero.

This is the typical rhetorical sophistry widely used in physics. We may also call it equivocation on the word zero. Here, Sagan defined the word zero both as nothing and something and he wants us to go along with this equivocation.

You cannot have non-zero height and zero height at the same time. If you have zero height you do not exist. A door is either open or closed.

Instead of flatlanders, Sagan may have used shadowlanders. Shadow is closer to two dimensional objects but even shadow has thickness.

So, Sagan's assumption that his objects have no height is definitely wrong. Can we arrive at a correct conclusion from this wrong assumption? I guess not.

Also it is clear that when we try to transform a 3-D object into a 4-D object, the object's morphology changes. A cube becomes something else. A cube does not cross the dimensions as a cube.

This also raises questions about physicists' claim that we are living in a 4 dimensional continuum called spacetime. If so, our current morphology in the 3-D world and 4-D world of spacetime cannot be the same.

Then the question is, is our present anatomy 3-D anatomy or 4-D anatomy?

nyc111 9 months ago

"The third dimension won’t be orthogonal."

Are you sure? In a 3-D coordinate system all axes are orthogonal because they make 90 degrees angles with each other.

Edit: I see that you mean "cannot be drawn as orthogonal lines on paper." But, in reality they are orthogonal e.g., when I construct a 3-D model of a cube.

derbOac 9 months ago

I love this stuff because it's so counterintuitive until you've worked through some of it. There was an article linked to on HN a while back about high-dimensional Gaussian distributions that was similar in message, and probably mathematically related at some level. It has so many implications for much of the work in deep learning and large data, among other things.

[-]

vqv 9 months ago

Statistician here.

I agree that some of this stuff seems counterintuitive on the surface. Once you make the connection with high-dimensional Gaussians, it can become more "obvious": if Z is standard n-dimensional Gaussian random vector, i.e. one with iid N(0,1) coordinates, then normalizing Z by its norm, say W, gives a random vector U that is uniformly distributed on an n-Sphere. Moreover, U is independent of W --- this is related to the fact that the sample mean and variance are independent for a random sample from a Normal population --- and W^2 has Chi-squared distribution on n degrees of freedom. So for example a statement about concentration of volume of the n-Sphere about an equatorial slice is equivalent to a statement about the probability that the dot product between U and a fixed unit norm vector is close to 0, and that probability is easy to approximate using undergraduate-level probability theory.

Circling back to data: it is very easy to be mislead when working with high-dimensional data, i.e. data with many, many features.

ljouhet 9 months ago

Thank you! I love this article, but I couldn't find it.

I always search "Curse of dimensionality" instead of "Counterintuitive properties..."

dgraph_advocate 9 months ago

[dead]

9 months ago

[deleted]

9 months ago

[deleted]

9 months ago

[deleted]

khana 9 months ago

[dead]