Am I missing something other commenters are seeing about this not being an ad? The domain is on Burla, which hosted the compute needed for this. There's a giant airbnb x burla logo at the top. People are saying there's a lawsuit pending, it's against guidelines, what's the point, etc..
It's content marketing plain and simple for Burla towards people that view this site. It was highly likely done by employees at both Burla and AirBNB together as a joint project.
One of the Burla founders here. Not a joint project with Airbnb. I’ve been experimenting with giving agents access to Burla clusters and letting them run with analysis ideas I find interesting. This was one of the results.
The branding is a bit much, fair call, but the intent here was just to explore what these agents can actually build when you give them access to large amounts of compute.
What a waste of energy (money/resources)... Scraping and AI-scanning 2 million photos to identify animals in the advertisement pictures? What's the point.
As an exercise a sample of 1000 photos would've been enough. As a database, knowing a listing has a cat in the picture or a funny review doesn't offer any real value.
I wonder what the footprint is of such an exercise.
The pet detection part isn’t the point, that’s just a visible output. The actual goal was to stress test agents + distributed compute on something non-trivial.
>Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster.
That's a lot of budget - would have been nice if they'd made an actual donation to the project, instead of pounding the project's servers and bandwidth when there are much better ways to interact with the data.
This seems like an advertisement for an open source package
>Scale Python across 1,000 CPUs or GPUs in 1 second.
Burla is a high-performance parallel processing library with an extremely fast developer experience. Scale batch processing, vector embeddings, inference, or build pipelines with dynamic hardware.
Edit: Author comment was flagged dead. They work at burla which is a managed cloud service for parallelizing python
These are amazing! Some are probably offensive, because I saw a cozy, if kitschy, British den labeled as "did-someone-just-leave" vibes which... unfair.
Ah yes, let's price the world out of the real estate market and then use insanely powerful AI models to systematically mock the living conditions of the poors.
Am I missing something other commenters are seeing about this not being an ad? The domain is on Burla, which hosted the compute needed for this. There's a giant airbnb x burla logo at the top. People are saying there's a lawsuit pending, it's against guidelines, what's the point, etc..
It's content marketing plain and simple for Burla towards people that view this site. It was highly likely done by employees at both Burla and AirBNB together as a joint project.
One of the Burla founders here. Not a joint project with Airbnb. I’ve been experimenting with giving agents access to Burla clusters and letting them run with analysis ideas I find interesting. This was one of the results.
The branding is a bit much, fair call, but the intent here was just to explore what these agents can actually build when you give them access to large amounts of compute.
What a waste of energy (money/resources)... Scraping and AI-scanning 2 million photos to identify animals in the advertisement pictures? What's the point.
As an exercise a sample of 1000 photos would've been enough. As a database, knowing a listing has a cat in the picture or a funny review doesn't offer any real value.
I wonder what the footprint is of such an exercise.
The pet detection part isn’t the point, that’s just a visible output. The actual goal was to stress test agents + distributed compute on something non-trivial.
"Looking at every public Airbnb listing in Inside Airbnb's open data dump, all at once, on Burla"
This Inside Airbnb?
Community Guidelines
Please:
Only take the data you need. Do not scrape data from the site, if you would like to subscribe to the data directly, please email data@insideairbnb.com
>Everything was parallelized on Burla, on a single dynamic cluster that scaled to ~1.7K CPU workers for photo download and CLIP, with 20 A100 GPUs running embedding clusters in parallel on the same cluster.
That's a lot of budget - would have been nice if they'd made an actual donation to the project, instead of pounding the project's servers and bandwidth when there are much better ways to interact with the data.
This seems like an advertisement for an open source package
>Scale Python across 1,000 CPUs or GPUs in 1 second. Burla is a high-performance parallel processing library with an extremely fast developer experience. Scale batch processing, vector embeddings, inference, or build pipelines with dynamic hardware.
Edit: Author comment was flagged dead. They work at burla which is a managed cloud service for parallelizing python
Looks like it was hit by some sort of automated ChatGPT detector.
Airbnb was actually started by two guys who created an opium den for Obama's convention so this doesn't surprise me.
This thing is ripe for a lawsuit and has terrible methodology as far as I can tell.
These are amazing! Some are probably offensive, because I saw a cozy, if kitschy, British den labeled as "did-someone-just-leave" vibes which... unfair.
This vanity scraping is fucking up the internet for everyone else.
It's hardly the only thing, but it's part of the problem.
Ah yes, let's price the world out of the real estate market and then use insanely powerful AI models to systematically mock the living conditions of the poors.
This is pretty great, the reviews at the bottom are the best part. I'm impressed they were able to scrape so much data