Phased Array Microphone (2023)

(benwang.dev)

410 points | by bglazer 12 hours ago ago

148 comments

  • frankus 11 hours ago

    "As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."

    Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."

    • analog31 an hour ago

      >>>> Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."

      A corollary that's one of my rules to live by: Never measure anything over time without also measuring the ambient temperature.

    • danielheath 6 hours ago

      Back in high school, I built (with some parental assistance) an apparatus to measure how quickly the pressure would drop (in a pressurized cylinder) when a very small hole allowed air to leak out.

      Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).

    • Marthinwurer 11 hours ago

      I love these kind of inadvertent measurements. One of my favorite examples is that a sufficiently accurate IMU can get you relatively accurate longitude measurements from the Coriolis effect.

      • nielsole 8 hours ago

        Asahi Linux (and likely MacOS too) uses the resistance of the speakers coils to detect overheating of same speakers and reduces volume.

        • squarefoot 3 hours ago

          That's the same principle used by cheap solder stations to regulate the tip temperature without employing a thermal sensor: they measure the heater resistance, presumably during the off state of the PWM signal that drives the heater. In that case the measurement is less accurate than using a real sensor, still good enough for cheap solder stations where a few degrees don't make a big difference.

        • derhuerst 5 hours ago
          • CamperBob2 4 hours ago

            Interesting. If the voltage across the speaker voice coil can be sampled with enough sensitivity at a fast-enough rate, you have an undocumented microphone.

      • 01HNNWZ0MV43FF 9 hours ago

        Is that the same thing where a flat-earther tried to measure something with an expensive laser gyro and kept finding that Earth was rotating?

        • adolph 8 hours ago

          I think the most you can tell from an IMU or gyro is that there is a change in velocity in a direction aligning with East-West when there is a change in location and that the change in velocity is greater when the location changes in line with North-South. The change in velocity would be greater as one approaches the poles and lesser at the equator.

          Thought experiment: if I zeroed my IMU at the North pole and traveled in a straight line away from the pole along longitude zero, following the guidance of the IMU. By the time I got to 45° latitude I’d be traveling Westward at 1,180 kph (.95 Mach) to keep the IMU at zero.

          • trueshape 8 hours ago

            The flat earther used a fibre optic gyro. You don't "zero" it, it continuously outputs a measurement of its own angular rate around it's sensitive axis. For a 3-axis gyro placed still on earth, it will read about 15 degree/hour around wherever the axis of earth is oriented.

      • adolph 9 hours ago

        Slight correction, latitude, not longitude.

        The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.

        https://www.oxts.com/blog/going-round-circles-earth-rotation...

        https://www.britannica.com/science/latitude

        • billyjmc 9 hours ago

          Coriolis corrections are thrown into sniper ballistic calculations, too. Not a huge effect in most conditions, but not zero, and there have been a lot of long shots in the past two decades.

      • psunavy03 7 hours ago

        I believe this is one of the initial steps an aircraft INS uses to find north while it is aligning, but it's been too long since I had aircraft systems theory in the front of my brain.

        • t0mas88 6 hours ago

          Yes, from earth rotation the INS could figure out true north if the latitude is known. Or figure out the latitude if current heading is known. But normally it's aligned with a starting position from pilot input or GPS.

      • emptiestplace 10 hours ago
    • Bearsilber 10 hours ago

      I just learned how the Duracell Powercheck© worked, which was done with temperature.

      https://youtu.be/zsA3X40nz9w?si=oGg2wdUlLXSDxpsN

    • user_7832 10 hours ago

      Is there one saying “All electronic devices are smoke machines, some can compute too”?

      • jaggederest 9 hours ago

        Similarly, diesel engines come with a reserve fuel supply that you can accidentally use once. (diesel engines will happily run on engine oil when warm)

        • TheSpiceIsLife 2 hours ago

          This happened to me once in a Peugeot 306 2L turbo diesel.

          Over filled it and kinda had to do one 1600m trip.

          Fortunately it was manual so I was able to stall it fairly swiftly in third gear with my foot on the break.

          Didn't seem to have any impact on the engine as far as normal operating and how it sounded. I didn't do any internal inspection.

      • qskousen 2 hours ago

        The one I've heard is "Every machine is a smoke machine, if you operate it wrongly enough."

      • ChuckMcM 7 hours ago

        "Inside every amplifier is an oscillator trying to get out."

      • frabert 9 hours ago

        "All diodes are light-emitting if you try hard enough"

        • sitkack 8 hours ago

          All diodes are also light SENSING is you try hard enough.

          • immibis 8 hours ago

            You don't have to try hard. Just use it as a photodiode and it magically works. However, if it's inside a plastic case that blocks light, it doesn't.

            Due to some law about entropy, efficient processes are necessarily reversible. That's why electric motors - some of the most efficient machines ever invented - are also generators.

            • sitkack 6 hours ago

              All diodes are photodiodes, one has to be esp careful of glass encapsulated diodes. I have had that bite me before.

            • biot 6 hours ago

              > However, if it's inside a plastic case that blocks light, it doesn't.

              You want an ordinary diode to allow current to flow easily when it senses light? Simple: shine a powerful laser at the plastic-encased diode and it will melt the plastic and liquify the metal, fusing it together and allowing current to flow again. See? You just needed to try harder.

              • Moru 3 hours ago

                Or if the hammer don't work, the sledgehammer is over there.

        • gavinsyancey 9 hours ago

          "All diodes are light-emitting at least once"

          • dmoy 8 hours ago

            Hahaha yea

            I've seen that in electronics lab a few times. The "temporarily light emitting diode"

            • Moru 3 hours ago

              I have a Temporarily light-emitting harddrive cable. Really old 40 MB hdd connected to an old computer with a cheap power supply that most likely couldn't handle the slightly lower than standard power in a friends house.

        • bsder 8 hours ago

          Ah, the light emitting resistor. The moment when you realize why it's called Ohm's Law.

      • qingcharles 8 hours ago

        "All electronics are hand-warmers if miscalibrated correctly enough."

    • cushychicken 10 hours ago

      > Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."

      I wanna say that’s a Bob Pease quote but I can’t find an attribution to it.

      • frankus 10 hours ago

        I first encountered it in Elecia White's book Making Embedded Systems, but the attribution is anonymous and whom it's attributed to may have heard it elsewhere.

    • kqr 9 hours ago

      Oh yeah. I realised this the day I discovered my fancy digital SLR was a thermometer: https://entropicthoughts.com/does-my-dslr-have-dead-pixels

      • glitchc an hour ago

        Yup, it's called dark noise. Random generation of electrpns which sometimes find their way into the depletion region.

      • djmips 8 hours ago

        A lot of people like myself consider heat a form of light but I guess a photographer would be just thinking visible light. They say that about 50% of the sun's light emissions comes in the infrared frequencies.

        • kqr 7 hours ago

          That seems like a mistake since heat can transfer e.g. via contact without any electromagnetic emission. In fact, that is what I think happens with the sensor also, given that there is an IR filter in front of it.

          But I may misunderstand your comment.

    • entropicdrifter 10 hours ago

      It does act as a thermometer, if and only if the altitude remains constant. The speed of sound fluctuates with both temperature and altitude

      • amluto 9 hours ago

        I’m not sure how the speed of sound could depend on altitude, even in principle. The air doesn’t know where it is!

        Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.

        https://en.wikipedia.org/wiki/Speed_of_sound

        • pants2 9 hours ago

          In liquids the speed of sound is related to the density, I would have thought similar for air but I see your point. Very insightful!

        • KennyBlanken 6 hours ago

          From your own link:

          "The speed has a weak dependence on frequency and pressure in ordinary air, deviating slightly from ideal behavior."

          "The speed of sound is raised by humidity. The difference between 0% and 100% humidity is about 1.5 m/s at standard pressure and temperature, but the size of the humidity effect increases dramatically with temperature."

          "Slight" can matter significantly in an application like this.

        • adolph 8 hours ago

          Can an ideal gas of same volume, mass and temperature be brought to different pressures?

          https://courses.lumenlearning.com/suny-physics/chapter/13-3-...

          • amluto 7 hours ago

            Not unless you change the average mass of the molecules.

            An ideal gas’ pressure is a function of number of particles per unit volume, its temperature, and nothing else. If you do anything involving adding or removing heat or changing the volume or pressure, you probably also need to know the specific heat at constant volume and the specific heat at constant pressure or, frequency, their ratio. That ratio is called the adiabatic index or the heat capacity ratio, it’s written as gamma, and it’s the last parameter in the speed of sound of an ideal gas. Interestingly, it doesn’t vary all that much between different gasses.

      • _0ffh 10 hours ago

        Right, it gets even worse: Air pressure in not only altitude-dependent but fluctuates even at constant altitude. The pressure (altitude) dependence is comparatively weak, though.

        • KeplerBoy 10 hours ago

          one might say air pressure changes constantly as we speak.

          • adammarples 9 hours ago

            Isn't air pressure the only thing that microphones actually measure?

            • KeplerBoy 9 hours ago

              By definition, sure. But one always needs some effect which changes some electrical property. We can't just hook up an ADC (analog digital converter) to thin air and hope for the best.

              In practice most microphones measure the displacement of microscopic membranes, which are deformed by the air pressure. The next question then becomes how to measure microscopic movements of a tiny membrane. Turns out the membrane forms part of a capacitor and the electrical characteristics of capacitors depend on their geometry.

              • jpc0 7 hours ago

                That is not necessary true.

                There are at least 4 different types of microphones. Condenser which does in fact form part of a capacitor, dynamic which is effectively a linear generator (coil attached to membrane), ribbon which is a change in resistance as a small ribbon flexes and piezoelectric which is some black magic witg crystals

                • KeplerBoy 7 hours ago

                  Sure, that's why I wrote most microphones.

                  There are also some exotic principles like laser or radar microphones using interferometry.

                  https://en.m.wikipedia.org/wiki/Laser_microphone

                  https://ieeexplore.ieee.org/document/7808865

                  • jpc0 7 hours ago

                    I think popular is very situational though.

                    For me I see a lot more dynamic than condensers but I guess if you are talking about what is in like every single IOT thingamabob then you might be right there.

                • sanderjd 6 hours ago

                  Fascinating. Is there a book about the history of microphones?

                  I find this to all be in the realm of "I don't believe you that any of this works at all" if I didn't have a lifetime of experience with the fruits of successfully-functioning microphones.

            • sojsurf 9 hours ago

              Air pressure differentials, to be precise!

            • immibis 8 hours ago

              Many types measure the derivative of air pressure. One that measures absolute air pressure can be used for calibration.

      • t0mas88 6 hours ago

        The speed of sound fluctuates with density. Altitude and temperature both change density.

  • dllu 11 hours ago

    I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.

    • lscharen 9 hours ago

      Reminds me of Intellectual Venture's Optical Fence developed to track and kill mosquitoes with short laser pulses.

      As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.

      • redblacktree 7 hours ago

        Where can I buy one?

        • dleary 5 hours ago

          This laser mosquito killer is, and always has been, a PR whitewashing campaign for Intellectual Venture's reputation.

          This device has never been built, never been purchasable, and it is ALWAYS brought up whenever IV wants to talk about how cool they are.

          And I say this as someone who loosely knows and was friends with a few people that worked there. They brought up this same invention when they were talking about their work. They eventually soured on the company, once they saw the actual sausage being made.

          IV is a patent troll, shaking down people doing the real work of developing products.

          They trot out this invention, and a handful of others, to appear like they are a public benefit. Never mind that most of these inventions don't really exist, have never been manufactured.

          They hide the extent of their holdings, they hide the byzantine network of shell companies they use to mask their holdings, and they spend a significant amount of their money lobbying (bribing).

          Why do they need to hide all of this?

          Look at their front page, prominently featuring the "Autoscope", for fighting malaria. Fighting malaria sounds great, they're the good guys, right? Now do a bit of web searching to try to find out what the Autoscope is and where it's being used. It's vaporware press release articles going back 8 years.

          Look at their "spinouts" page, and try to find any real substance at all on these companies. It is all gossamer, marketing speak with nothing behind it when you actually go looking for it.

          Meanwhile, they hold a portfolio of more than 40,000 patents, and they siphon off billions from the real economy. Part of their "licensing agreement" is that you can't talk badly about them after they shake you down, or else the price goes up.

          They are rent-seeking parasites.

        • cyberax 6 hours ago

          I don't think you can. This kind of laser devices is wildly dangerous.

          • noobface 2 hours ago

            Correct your vision and your mosquito problem in one easy purchase.

    • bafe 11 hours ago

      I did a similar project at 18. Needless to say I didn't have enough HW and SW skills to do much since I implemented the most naive form of the TDOA algorithms as well as the most inefficient way of estimating the time difference through cross correlation. I still learnt a lot and it led me to eventually getting a PhD in SAR systems, which are actually beamformers using the movement of the platform instead of an array

    • jessetemp 11 hours ago

      What were the results of your study? I’ve heard that bat lungs are so sensitive that when they fly across the pressure differential of large turbines their capillaries basically explode

    • mywacaday 11 hours ago

      I would love to do something like that to track the bats in my garden, how feasible would it be for an amateur to do as a personal project? Any good references on where to start.

    • FredPret 11 hours ago

      I had no idea they were mammals until this comment. I thought they were furry birds!

      • repiret 9 hours ago

        It is not unreasonable to think of bats as flying mice.

        • unwind 7 hours ago

          In Swedish that is almost exactly what they are called, bat translates to "fladdermus" which is "fladder" (flutter) and "mus" (mouse).

    • neumann 7 hours ago

      That sounds super interesting. Is there a write up somewhere of the project?

    • NL807 7 hours ago

      That sounds like a fun project. Was it part of a research grant?

    • isatty 9 hours ago

      > bats (the flying mammal)

      As opposed to?

    • ryandvm 11 hours ago

      Honestly, that sounds like amazing work. I wish I could afford to get out of enterprise software engineering and just do academic software development like that.

  • dchichkov 8 hours ago

    I'm curious, why haven't you used TDM I2S microphones for your array and used PDM?

    I understand that ICS-52000 is a relatively low cost ($2/100pcs) and there are even breakout boards available with 4 microphones, which can be chained to 8 or 16, like https://www.cdiweb.com/datasheets/notwired/ds-nw-aud-ics5200...

    Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.

    • tverbeure an hour ago

      I've considered making a phased array myself, but never got around to sending out the PCB. But here are two reasons by I2S is not the best option:

      * I2S requires 3 instead of the 2 pins of PDM. However, in the datasheet that you provided, it shows how you can daisy-chain microphones which is really cool (even if not standard I2S.) So that argument goes away.

      * PDM gives you access to way higher sample rates which in turns gives you more flexibility in choosing the delay for a delay-and-sum operation. For example, if the PDM clock is 2MHz, you could theoretically delay with a precision of 0.5us. In practice, you'll do that with lower precision, but with I2S, the clock will typically max out at 192kHz.

      * PDM microphones then do be cheaper.

    • kindiana an hour ago

      (OP here) tverbeure hit most of the main points, but mostly cost ($2/mic vs $0.5/mic adds up when there are 192 microphones), difficulty of finding things with enough i2s interfaces (even with 16 way daisy chaining, thats still more than most/all things will have). The FPGA/custom hardware was part of the fun as well!

    • morcheeba 5 hours ago

      Not OP, but I looked in to this a few years ago. It was more expensive then, and only went to 20 kHz. Higher frequencies are helpful if you're listening for the hiss of leaking gas, or corona discharge of an electric arc.

      The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.

      My notes:

      ICS-52000 $3.50, 20 kHz

      ICS-41350 $1.05, 40 kHz

      SPH0641LU4H-1 $1.45, 80 kHz+

      [1] https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide...

  • jcims 10 hours ago

    Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh

    The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.

    • sipjca 2 hours ago

      I am very interested in how small these arrays can be. From talking with a friend with cochlear implants, I would assume this could help dramatically with the right signal processing to help him hear.

  • brunosan 11 hours ago

    Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound

    • zipy124 4 hours ago

      I do my PhD in in-air ultrasound with phased arrays and talk to the medical guys at conferences/labs that we talk to and it's soooo much harder in solids/liquids. The frequency is significantly higher, think 1-10MHz instead of like 40khz, so any normal electronics are out the window.

    • brudgers 10 hours ago

      Then, why not be a grad student again?

      • 01100011 10 hours ago

        Maybe they want to afford dinner?

        • polishdude20 9 hours ago

          Hey saw your message a while back in a thread talking about continuous glucose meters and feeling tired and fatigued etc. Mind contacting me? I'd love to chat. My email is in my profile

        • brudgers 8 hours ago

          TANSTAAFL, but student loans too.

    • always_swapping 10 hours ago

      I may be the FUS grad student you seek. Reach out via profile email if you want to chat. Cheers!

    • etrautmann 10 hours ago

      Medical applications would presumably require contact coupling and not through air?

  • adamcharnock 11 hours ago

    I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).

    Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).

    • abecedarius 10 hours ago

      For the hard of hearing like me the killer application would be live transcription in a noisy setting like a meetup or party, with source separation and grouping of speech from different speakers. Could be life-changing.

      (Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)

      • adolph 8 hours ago

        * Automatic speech recognition (ASR) systems have progressed to the point where humans can interact with computing devices using speech. However, the distance between a device and the speaker will cause a loss in speech quality and therefore impact the effectiveness of ASR performance. As such, there is a greater need to have reliable voice capture for far-field speech recognition. The launch of Amazon Echo devices prompted the use of far-field ASR in the consumer electronics space, as it allows its users to interact with the device from several meters away by using microphone array processing techniques.*

        https://assets.amazon.science/da/c2/71f5f9fa49f585a4616e49d5...

    • spaceywilly 2 hours ago

      This is known as the Cocktail Party Problem. It turns out or brains do an incredible amount of processing to allow us to understand a person talking to us in a noisy room.

      https://en.wikipedia.org/wiki/Cocktail_party_effect?wprov=sf...

    • MVissers 10 hours ago

      I believe modern macbook pro’s already have multiple microphones that probably do some phase-array magic.

      • refulgentis 5 hours ago

        Pretty much every device does, the trick always was if it actually worked, which Apple is assuredly great at. (source: worked on Google Assistant)

    • quantadev 4 hours ago

      In general the position of the microphones in space must be known precisely for the phase shifting math to be done well, and also the clocks on the phones would need to be in sync at high precision like 10x the highest frequency sound you're picking up. In other words within 10s of thousands of a second. Also if the array mic locations is not a simple straight line, circle, or other simple geometry the computer code (ie. math) to milk out an improved signal becomes very difficult.

      • NavinF 34 minutes ago

        > 10s of thousands of a second

        10ms? That's a very long time. Phone clocks are much more accurate than that because they're synced to the atomic clocks in cell towers and GPS satellites.

        Hell even NTP can do 1ms over the internet. AFAIK the only modern devices with >10ms inaccurate clocks by default are Windows desktops. I complained about that before because it screwed up my one-way latency measurements: https://github.com/microsoft/WSL/issues/6310

        I solved that problem by RTFM and toggling some settings until I got the same accuracy as Linux: https://learn.microsoft.com/en-us/windows-server/networking/...

        Anyway I dunno why the math would be too complicated, GPUs are great at this kind of signal processing

    • hatsunearu 11 hours ago

      It's already kind of implemented.

  • hinkley 10 hours ago

    Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.

    Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.

  • kindiana an hour ago

    OP here, cool to see so many people are interested in this project! Happy to answer any questions (and I'll go around to reply to any questions already here)

  • Salmonfisher11 11 hours ago

    If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.

    Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.

    For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.

    • telgareith 9 hours ago

      Xilinx tool chain is also no-cost.

      • scottapotamas 3 hours ago

        For some/smaller parts. Once you start going higher than Artix or the token Kintex parts you need to pay up.

  • crote 11 hours ago

    I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?

    • elictronic 10 hours ago

      Pcb manufacturing is cheap. I put 20 parts 1.5 inch by 24 inch into pcbway and ended up with final delivered cost of 240 dollars.

      Not having to deal with wiring that many individual boards and all days of headaches tracking down issues is well worth it in my book.

      • crote 6 hours ago

        Huh, you're right. I expected 24-inch-long PCBs to be quite a bit more expensive, but even 4-layer boards at those sizes are still available at discount prices. I guess such thin boards could be used to fill in edges of mixed-order panels? It does make me wonder why they say "the array" was $700. Maybe assembly was extremely expensive

        It doesn't seem they weren't really able to benefit from it all that much, though: half of them arrived defective, and they had to do quite a lot of debugging to fix them.

        • kindiana an hour ago

          (OP here) the $700 was for 50 arm boards and 5 hub boards, fully assembled and shipped including all the parts (enough for 2 full arrays, with some spares). $350 @ qty 2 is pretty good, considering just the microphones is ~$100 for each array!

          Unfortunately the assembly/DFM didn't work out well, but with some better design and foresight it should be much less work/wiring compared to wiring them manually.

  • jsharf 4 hours ago

    Wow, you can refocus the direction after the audio is recorded!

    This would be cool to mix with VR, so you could hear different conversations as you move around a virtual room

  • gravypod 11 hours ago

    I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.

    • aspenmayer 5 hours ago

      You might look into OBS and/or VoiceMeeter to see how streamers selectively route audio while livestreaming/recording video/audio streams.

      https://obsproject.com/

      https://voicemeeter.com/

    • imbusy111 11 hours ago

      From the article "The simplest method of beamforming is delay-and-sum (DAS)". Measure distance from a point (couch) to each microphone, delay the signal in time domain by the time the sound takes to travel from point (couch) to microphone, and add up the signals. Pretty trivial. Basically you want the microphones receive the couch signal at the same time, even though they are different distances away.

      Make sure there is enough variation in microphone distances for this method to be effective.

    • crazygringo 11 hours ago

      Loud show noise and your online friends' nearby audio is going to be reflected around the room as well as off of your bodies.

      What you want isn't microphone or beamforming tech, it's echo cancellation the same as every videoconferencing software uses.

      You just need to feed the show audio and friend audio in, and apply echo cancellation to each.

  • beambot 9 hours ago

    Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.

  • amelius 11 hours ago

    Nice. It would be cool if this project could cleanly separate sources based on location.

    That would be a bit like a lightfield camera, where you can edit the focusing parameters after the image has already been taken, but now with sound.

    https://en.wikipedia.org/wiki/Light_field_camera

    • miloignis 9 hours ago

      I believe it can, there's a demo under the "Directional Audio" section, unless I misunderstand you.

    • hinkley 10 hours ago

      I’m still sad these didn’t become a thing. I don’t need a 48MP camera phone. No seriously. I do not.

      • sitkack 6 hours ago

        If you can get a microlens array infront of that 48MP imager, you can have the light field camera you seek.

  • proee 11 hours ago

    What is the most practical application for this technology? Could you use it to pinpoint sounds coming from a car like a squeak?

    • Salmonfisher11 11 hours ago

      A similar technique is very popular in industrial automation to spot leaks in compressed air pipes and their connections from far away. These leaks are extremely loud in the ultrasonic range. It's overlayed with a camera picture.

      That's ultra expensive gear.

    • spankalee 11 hours ago

      I've always wanted this for videoconferencing room. A microphone array around the screen should be able to dynamically focus on the active talkers and cancel out background noise and echos to get much better sound quality that the muddy crap we usually get.

      If there were a speaker array around the screens too, you might be able to localize the audio for each person so that it seems like the sound is coming from where their head is on the screen.

      • icegreentea2 11 hours ago

        Shure sells a variety of array microphones (and the software) that handles similar things. I've never used one, but heh.

        https://www.shure.com/en-US/products/microphones/mxa710

        https://www.shure.com/en-US/products/microphones/mxa920

      • Salmonfisher11 11 hours ago

        Beamforming is standard in modern conference room gear. It's being used for making a video focus on the active speaker and optimizing his audio.

        Have a look at the "Meeting Owl" for example.

        It works great up to a limit (around 5m) then you will need additional microphones closer to the speaker.

      • markedathome 10 hours ago

        Microsoft Research had papers on speaker arrays that allowed speaker focus and noise cancelling a couple of decades ago. I think the technology eventually ended up in the Kinect.

        I think Cisco had something similar in their large screen meeting room video conferencing systems that could do positional audio tracking of multiple people. Could be wrong, but I think that was at least 10 years or so ago, if not more.

      • bongodongobob 9 hours ago

        You just need to buy actual video conferencing gear, this is par for the course.

    • __MatrixMan__ 10 hours ago

      I wish could rent one to figure out which device in my office has a squealing capacitor. I can hear it well enough to be driven crazy by it, but not well enough to find it. I start disconnecting things to narrow it down but then convince myself that it's my ears ringing.

      I'm unsure if I'll age out of this problem, or if worse hearing will just recreate it at different thresholds.

      • adrianmonk 9 hours ago

        You might have some luck with a spectrum analyzer app[1]. A fixed-pitch whine should show up as a line on the waterfall graph. If you move the phone around to differently locations, you might see the line getting stronger or weaker. You can also try rotating the phone to different orientations to see if it is coming from a particular direction.

        I used this to locate an annoying squeal coming from some equipment at work once. And to confirm that it wasn't imaginary.

        ---

        [1] On Android, I like these two:

        Spectroid (https://play.google.com/store/apps/details?id=org.intoorbit....). If you use this, consider turning on the waterfall display in the settings.

        Spectral Audio Analyzer (https://play.google.com/store/apps/details?id=radonsoft.net....). This has more color options for the waterfall display.

        • billyjmc 5 hours ago

          Phyphox is a great sensor suite app for undergrad Physics experiments, and it includes a spectrum analyzer. Also, it supports both iOS and Android.

    • hammock 10 hours ago

      The tech is beamforming .. the applications are AV conferencing, camera tracking, voice lift, or sound reinforcement

  • gizajob 11 hours ago

    What about a soundfield microphone? Does about the same thing and the electronics can be done in the analogue domain.

    • radiowave 7 hours ago

      At a rough guess from the audio samples, that array is producing an acceptance angle much narrower than any Soundfield mic is capable of. The noise source is only 45 degrees off-axis; I'd say any first-order microphone polar pattern (i.e. those a Soundfield mic is capable of) would capture more of the noise than is demonstrated here.

      Of course, you can improve on the rejection of off-axis sound by instead using a microphone with a more specialized polar patten (e.g. a shotgun mic), but then you lose the property of the pattern being steerable merely by signal processing.

      Lastly, such an array of dirt cheap pressure sensitive mic capsules with some clever computation behind them strikes me as the sort of thing you could throw Moore's law at, if you could justify the quantity. Whereas, Soundfield mics don't make much sense unless you're working with very precisely machined pressure-gradient capsules.

      Still, I get the feeling it'll be a while yet before this technique starts looking viable for audio production work, but it's very interesting.

  • cushychicken 11 hours ago

    This is more or less the same principle of how Amazon Echo devices work, but on steroids.

    Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.

  • killjoywashere 11 hours ago

    I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?

    • quantadev 4 hours ago

      This beamforming effect only works well when each sensor is getting a dramatic enough "different angle" on the signal that each one can use phase shifting to cancel out other noise, but with a laser there's not really any noise to cancel out (i mean you're just monitoring a vibrational spot on a window), and you also don't have a far enough "different angle" to shine from, if you're monitoring from one spot.

      However having multiple lasers from multiple different locations might be able to create an improved signal if all signals are averaged, but it wouldn't really be due to the phase shifting that's used in beamforming.

    • Salmonfisher11 10 hours ago

      Didn't Israeli students show that you can recover audio from the vibrations of bulb filament with a fast photo diode?

      I'd test that with a CCD line sensor plus a wide aperture lens and reading it out with 8kHz. Then you have 128 audio pixels that can cover an entire city.

      • killjoywashere 9 hours ago

        Line of sight might be an issue there. I'm thinking more high-end clandestine eavesdropping. Fun fact: curtains are a pretty good defeat for laser microphones, but if the building is really old and made of solid stone, you can point at the rock instead!

        • 0cf8612b2e1e 8 hours ago

          The rock?! That’s incredible. I would have guessed it was too dense to pick up normal speaking volume. Then again, even the window glass vibration seems pretty magical to me.

  • djmips 8 hours ago

    This has been on my to-do list since forever! Nice work Ben Wang.

  • pftburger 9 hours ago

    I wonder if there is a meaningful limit to number of listening zones. I’m imagining a 3d grid of virtual mics in a space, each with an AI behind it

    Heck, train the model on the raw sensor data and you get the most awesome conference mics

  • jensenbox 5 hours ago

    Why a radial pattern and not a grid?

    • kindiana an hour ago

      (OP here) Primary reason is that you can make a big array with only 2 boards, a small board in the middle and a bunch of long boards around it.

      Radial pattern of linear arrays with exponential spacing should also be pretty close to optimal for the distribution of pairwise microphone distances to maximize the gain with a fixed number of microphones.

    • quantadev 4 hours ago

      Because the distance between the mics needs to be 1) large and 2) consistent. It would work with a grid but the mics near the middle would be "underutilized" (not maximally taken advantage of), and also in a grid the mathematics is horrendous, but with a circle it's simple.

  • jojobas 3 hours ago

    Using crab rave for demo is top notch.

  • holyknight 3 hours ago

    damn, this is so cool

  • cma 7 hours ago

    Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?

    • markhahn 7 hours ago

      afaik, it really depends on the spatial structure of the audio field.

      think nyquist sampling rates, applied to space, and you can't apply a low-pass filter just because you don't care about higher-order signals. that means that for any given audio environment, there will be some "spatial spectrum" of signal, and you need to sample it densely enough to avoid aliasing.