Internet Archive breached again through stolen access tokens

(bleepingcomputer.com)

335 points | by vladyslavfox 10 hours ago ago

177 comments

  • bn-l an hour ago

    > "It's dispiriting to see that even after being made aware of the breach weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets," reads an email from the threat actor.

    With everything that’s going on, it’s highly suspicious that this is happening right after they upset some very rich rent seekers.

  • trompetenaccoun 10 hours ago

    We need archives built on decentralized storage. Don't get me wrong, I really like and support the work Internet Archive is doing, but preserving history is too important to entrust it solely to singular entities, which means singular points of failure.

    • jdiff 9 hours ago

      This seems to get brought at least once in the comments for every one of these articles that pops up.

      The IA has tried distributing their stores, but nowhere near enough people actually put their storage where their mouths are.

      • creer 7 hours ago

        And it's guaranteed not to happen if the efforts don't continue.

        • acdha 4 hours ago

          You could say the same thing about perpetual motion. Being realistic about why past efforts have failed is key to doing better in the future: for example, people won’t mirror content which could get them in trouble and most people want to feel some kind of benefit or thanks. People should be thinking about how to change dynamics like those rather than burning out volunteers trying more ideas which don’t change the underlying game.

      • zelphirkalt 6 hours ago

        Perhaps one idea is to let people choose what they want to protect. This way people wanting to support it can have their mission.

        • card_zero 6 hours ago

          I want it to protect all sorts of random obscure documents, mostly kind of crappy, that I can't predict in advance, so I can pursue my hobby of answering random obscure questions. For instance:

          * What is a "bird famine", and did one happen in 1880?

          * Did any astrologer ever claim that the constellations "remember" the areas of the sky, and hence zodiac signs, that they belonged to in ancient times before precession shifted them around?

          * Who first said "psychology is pulling habits out of rats", and in what context? (That one's on Wikiquote now, but only because I put it there after research on IA.)

          Or consider the recently rediscovered Bram Stoker short story. That was found in an actual library, but only because the library kept copies of old Irish newspapers instead of lining cupboards with them.

          The necessary documents to answer highly specific questions are very boring, and nobody has any reason to like them.

          • oxygen_crisis 2 hours ago

            You could let users choose what to mirror, and one of those choices could be a big bucket of all the least available stuff, for pure preservationists who don't want to focus on particular segments of the data.

            Sort of like the bittorrent algorithm that favors retrieving and sharing the least-available chunks if you haven't assigned any priority to certain parts.

        • dawnerd 5 hours ago

          You already can, they have torrents for everything.

          • diggan 4 hours ago

            > they have torrents for everything

            Including the index itself? That would be awesome.

          • tourmalinetaco 3 hours ago

            Their torrents suck and IME don’t update to changes in the archive.

            • addandsubtract 12 minutes ago

              Aren't torrents terrible at handling updates in general? If you want to make a change to the data, or even just add our remove data, you have to create a new torrent and somehow get people to update their torrent and data as well.

            • vundercind an hour ago

              This is accurate, their torrent-generating system is basically broken to the point of being useless.

      • WarOnPrivacy 7 hours ago

        > nowhere near enough people actually put their storage where their mouths are.

        Typically because most people who have the upload, don't know that they can. And if they come to the notion on their own, they won't know how.

        If they put the notion to a search engine, the keywords they come up with probably don't return the needed ELI5 page.

        As in: How do I [?] for the Internet Archive?, most folks won't know what [?] needs to be.

        • TZubiri 7 hours ago

          This is literally torrents. Just give up

          • WarOnPrivacy 3 hours ago

            > This is literally torrents. Just give up

            Most casual visitors to IA don't know that. Which is the point.

            Giving up is for others.

          • briandear 6 hours ago

            The problem with torrents is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

            • ycombinatrix an hour ago

              The problem with websites is they have a bad reputation since people use it to steal and redistribute other people’s content without their consent.

            • thwarted 3 hours ago

              The problem with file transfer is they have a bad reputation since people use it to [insert illegal or immoral activity here].

              Then rename it from "torrent" to something else.

              • TZubiri 3 hours ago

                I'm not sure what the argumentative line is here. But file uploading and downloading needs to have accountability for hosting, which p2p obscures.

                The bad reputation is inherent to the tech, not a random quirk.

            • card_zero 6 hours ago

              Is there any form of torrent where you can do a full text search? That, to me, is the more important problem with torrents.

              • TZubiri 4 hours ago

                But internet archive doesn't do this? It's a key based search (url keys)

            • tourmalinetaco 3 hours ago

              Torrents have a bad reputation due to malicious executables, I have never met someone who genuinely saw piracy as stealing, only as dangerous. In fact, stealing as a definition cannot cover digital piracy, as stealing is to take something away, and to take is to possess something physically. The correct term is copying, because you are duplicating files. And that’s not even getting into the cultural protection piracy affords in today’s DRM and license-filled world.

            • AlienRobot 6 hours ago

              Give it a good reputation then.

              What are some legal torrent trackers?

              • seam_carver 3 hours ago

                Humble Bundle. Various Linux iso

              • unleaded 5 hours ago

                archive.org to name one

                • boomboomsubban 4 hours ago

                  That's debatable. Most of their torrents are for things under copyright, though any other decentralized archive would have the same problem.

                  • tourmalinetaco 3 hours ago

                    That’s a copyright problem. 99% of things made in the last 100 years fall under copyright.

                    • trod123 4 minutes ago

                      and a good number of things that were going to pass into copyright were further extended to 2053.

              • ranger_danger 4 hours ago

                What is your definition of a legal torrent tracker? I was not aware there were even any illegal ones.

            • ranger_danger 4 hours ago

              To me this is like saying you shouldn't use a knife because they are also used by criminals.

              • John_Cena 4 hours ago

                This kind of talk is simply modern politik-speak. I can't stand it and the people who fall for their deception. Stretch the truth to disarm the constituents

                • jonhohle 2 hours ago

                  In what way? Torrents are used all over for content delivery. Battle.net uses a proprietary version of BitTorrent. It’s now owned by Microsoft. There’s many more legitimate uses as commented by many others.

                  Criminals using tools does not make the tools criminal.

      • immibis 9 hours ago

        Keep in mind the IA archives a lot of garbage. If it could be more focused it would be more likely to work.

        • Blackthorn 6 hours ago

          The IA only works because it archives everything. You don't know what you need until you need it.

        • Spooky23 5 hours ago

          Archives generally purposefully don’t have a strong editorial streak. My trash is your treasure.

        • db48x 8 hours ago

          The attempts have actually been focused on specific types of content, such as historical videos.

        • unleaded 5 hours ago

          personally I love all the random crap on IA!

    • MattPalmer1086 9 hours ago

      Lots of Copies Keeps Stuff Safe

      https://www.lockss.org/

      This is a brilliant system relying on a randomised consensus protocol. I wanted to do my info sec dissertation on it, but its security model is extremely well thought out. There wasn't anything I felt I could add to it.

      • ChadNauseam 7 hours ago

        I wish IPFS wasn't so wasteful with respect to storage. I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether. It's also relatively slow. However its implementation of global deduplication is super cool – it means that I can host 5 pages and you can host 50, and any overlap between them means we can both help one another keep them available even if we don't know about one another beforehand.

        For a large-scale archival project, it might not be ideal. Maybe something based on erasure coding would be better. Do you know how LOCKSS compares?

        • diggan 4 hours ago

          > I tried pinning a 200mb PDF on IPFS and doing so ended up taking almost a gigabyte of disk space altogether

          Was that any file in particular? I just tried it myself with a 257mb PDF (as reported by `ls -lrth`) and doesn't seem to add that much overhead:

              $ du -sh ~/.ipfs
              84K     /home/user/.ipfs
          
              $ ipfs add ~/Downloads/large\ PDF\ File.pdf
              added QmSvbEgCuRNZpkKyQm6nA5vz5RTHW1nxb6MJdR4cZUrnDj large PDF File.pdf
               256.58 MiB / 256.58 MiB [============] 100.00%
          
              $ du -sh ~/.ipfs
              264M    /home/user/.ipfs
      • Kinrany 3 hours ago

        Is there a high level explanation of the model?

      • TZubiri 7 hours ago

        High Costs Makes Lots of Copies Unfeasible

        • MattPalmer1086 7 hours ago

          That was actually one of the key constraints in the LOCKSS system, since it was designed to be run by libraries that don't have big budgets.

          The design is really very good.

    • __MatrixMan__ 7 hours ago

      To make the web distributed-archive-friendly I think we need to start referencing things by hash and not by a path which some server has implied it will serve consistently but which actually shows you different data at different times for a million different reasons.

      If different data always gets a different reference, it's easy to know if you have enough backups of it. If the same name gets you a pile of snapshots taken under different conditions, it's hard to be sure which of those are the thing that we'd want to back up for that particular name.

      • Cheer2171 6 hours ago

        Done. It is called IPFS. The IA already supports it.

        https://github.com/internetarchive/dweb-archive/blob/master/...

        • Groxx 4 hours ago

          Which has a rather lengthy section explaining why it's currently a failed experiment: https://github.com/internetarchive/dweb-archive/blob/master/...

          (this doc is 5-6 years old though, and I'm not sure what may have changed since then)

          In my own (toy-scale) IPFS experiments a couple years ago it has been rather usable, but also the software has been utterly insane for operators and users, and if I were IA I would only consider it if I budgeted for a from-scratch rewrite (of the stuff in use). Nearly uncontrollable and unintrospectable and high resource use for no apparent reason.

        • majorchord 5 hours ago

          IPFS has shown that the protocol is fundamentally broken at the level of growth they want to achieve and it is already extremely slow as it is. It often takes several minutes to locate a single file.

          • diggan 4 hours ago

            The beauty is that IA could offer their own distribution of IPFS that uses their own DHT for example, and they could allow only public read access to it. This would solve the slow part of finding a file, for IA specifically. Then the actual transfers tend to be pretty quick with IPFS.

            What's the point of using IPFS then? Others can still spread the file elsewhere and verify it's the correct one, by using the exact same ID of the file, although on two different networks. The beauty of content-addressing I guess.

            • acdha 4 hours ago

              That isn’t solving the problem, it’s just giving them more of it to work on. IA has enough material that I’d be surprised if they didn’t hit IPFS’s design limits on their own, and they’d likely need to change the design in ways which would be hard to get upstream.

          • BlueTemplar 4 hours ago

            Several minutes sounds more than fine for this purpose ?

            Especially if it's about having an Internet Archive backup.

            • Aachen 2 hours ago

              I think the point is that it's already slow at the current amount of data, let alone when you stuff dozens more PB into it

        • __MatrixMan__ 5 hours ago

          Right, what I'm saying is that now we need to get the rest of the web (or at least the parts we want to keep) on board.

      • jonhohle an hour ago

        There was a startup called Space Monkey that sold NAS drives where you got a portion of the space and the rest was used for copies of other people’s content (encrypted). The idea was you could lose your device, plug in a new one and restore from the cloud. They ended up folding before any of their resilience claims could be tested (at least by me).

        Would be people be willing to buy an IA box that hosted a shard of random content along with the things they wanted themselves?

    • NelsonMinar 2 hours ago

      Is anyone using ArchiveBox regularly? It's a self-hosted archiving solution. Not the ambitious decentralized system I think this comment is thinking of but a practical way for someone to run an archive for themselves. https://archivebox.io/

      • bigiain 2 hours ago

        @nikisweeting the dev of archivebox was active in a thread about out here last week.

        https://news.ycombinator.com/item?id=41860909

        I'd never heard of it, but their responses to question and comments in that thread were really really good (and I now have "install and configure archivebox on the media server" on my upcoming weekend projects list).

    • oytis 9 hours ago

      We'll need to find even more people willing to expose themselves to legal threats and cyberattacks then.

      • trompetenaccoun 8 hours ago

        The legal side is a big issue, true. The simplest and best workaround that I'm aware of is how the Arweave network handles it. They leave it up to the individual what parts of the data they want to host, but they're financially incentivized to take on rare data that others aren't hosting, because the rarer it is the more they get rewarded. Since it's decentralized and globally distributed, if something is risky to host in one jurisdiction, people in another can take that job and vice versa. The data also can not be altered after it's uploaded, and that's verifiable through hashes and sampling. Main downside in its current form is that decentralized storage isn't as fast as having central servers. And the experience can vary of course, depending on the host you connect to.

        As for technical attacks, I'm not an expert but I'd assume it's more difficult for bad actors to bring down decentralized networks. Has the BitTorrent network ever gone offline because it was hacked for example? That seems like it would be extremely hard to do, not even the movie industry managed to take them down.

        • Aachen 2 hours ago

          > decentralized storage isn't as fast as having central servers.

          With the 30-second "time to first byte" speed we all know and love from IA, I'm pretty sure it'd only get faster when you're the only person accessing an obscure document on a random person's shoebox in Korea as compared to trying to fetch it from a centralised server that has a few thousand other clients to attend to simultaneously

        • jmb99 2 hours ago

          > decentralized storage isn't as fast as having central servers.

          Depending on scale that’s not necessarily true. I find even today there are many services that cannot keep up with my residential fiber connection (3Gbps symmetrical), whereas torrents frequently can. IA in particular is notoriously slow when downloading from their servers, and even taking into account DHT time torrents can be much faster.

          Now if all of their PBs of data were cached in a CDN, yeah that’s probably faster than any decentralized solution. But that will take a heck of a lot more money to maintain than I think is possible for IA.

      • Aachen 2 hours ago

        I collect, archive, and host data. Haven't gotten any threats or attacks. Not one. The average r/selfhosted user hiding their personal OwnCloud behind the DDoS maffia seems more afraid than one needs to be even for hosting all sorts of things publicly. I guess this fearmongering comes from tech news about breaches and DDoS attacks on organisations, similar to regular news impacting your regular worldview regardless of how it's actually going in the world or how things personally affect you

        • trod123 a few seconds ago

          Its not a problem until it suddenly is, and by the time it becomes a problem its too late. Its not fear mongering, its risk management and the laws are draconian and fail fundamental basis for a "rule of law", we have a "rule by law".

    • TechSquidTV 6 hours ago

      This has really shown that the be true. I am stuck in a situation right now where I have some lost media I want to upload but they have been down for over a week. I plan to create a torrent in the meantime but that means relying on my personal network connection for the vast majority of downloads up front. I looked into CloudFlare R2, not terrible but not free either.

      I was looking into using R2 as a web seed for the torrent but I don't _really_ want to spend much to upload content that is going to get "stolen" and reuploaded by content farms anyway you know?

      • tourmalinetaco 3 hours ago

        Why not subscribe to a seedbox? They’re about $5/2TB/mo. It protects your IP, you can buy for only the month, and since seedboxes are hosted in DMCA-resistant data centers you can download riskier torrents lightning fast, meaning you’re not just spending money for others, you can get something out of it too.

        • bigiain 2 hours ago

          Any hints or recommendations on how to find a decent seedbox vendor? (working email in profile if you'd rather not name any in public)

    • johndhi 38 minutes ago

      Ipfs

    • Cheer2171 6 hours ago

      You say this as if the IA is not already deeply invested in the DWeb movement. If you go to a DWeb event in the Bay Area, there is a good chance it will be held at the IA.

    • sschueller 5 hours ago

      Yes, I was quite shocked when I found out that all their DCs are within driving distance.

    • delfinom 3 hours ago

      Yea so, who pays for the decentralized storage long term? What happens when someone storing decentralized data decides to exit? Will data be copied to multiple places, who is going to pay for doubling, tripling or more the storage costs for backups?

      Centralized entities emerge to absorb costs because nobody else can do it as efficiently alone.

    • sksxihve 9 hours ago

      There's no real financial incentive for people to archive the data as a singular entity so even less for a distributed collection. Also it's probably easier to fund a single entity sufficiently so they can have security/code audits than a bunch of entities all trying to work together.

      • riiii 9 hours ago

        Some people are motivated by more than just financial incentive.

        • sksxihve 8 hours ago

          That's true, but something like archiving the internet is very costly, IA has an annual budget in the tens of millions.

          • trompetenaccoun 8 hours ago

            Yes, it's a good point. Though they could take that money and reward people for hosting the data as well, couldn't they? They don't have to be in charge of hosting.

            • sksxihve 7 hours ago

              Yes, they could, that's not much different than a single company distributing the archive to multiple storage centers though. My original comment was about it being more cost effective for a single company to do that than coordinating with a bunch of disjoint entities.

              • trompetenaccoun 6 hours ago

                Our digital memory shouldn't be in the hands of a small number of organizations in my view. You're right about cost effectiveness. There are pros and cons to both but it's not just external threats that have to be considered.

                History has always gotten rewritten throughout time. If you have a giant library it's easier for bad actors to gain influence and alter certain books, or remove them. This isn't just theoretical, under external pressure IA has already removed sites from its archive for copyright and political reasons.

                There are also threats that are generally not even considered because they happen with rare frequency, but when they happen they're devastating. The library of Alexandria was burned by Julius Caesar during a war. Likewise, if all your servers are in one country that geographic risk, they can get destroyed in the event of a war or such. No one expects this to happen today in the US, but archives should be robust long term, for decades, ideally even centuries.

                • delfinom 3 hours ago

                  >Our digital memory shouldn't be in the hands of a small number of organizations in my view.

                  I would wager at least 95% of "digital memory" archived is just absolute garbage from SEO spam to just some small websites holding no actual value.

                  The true digital memory of the world is almost entirely behind the walls of reddit, twitter, facebook, and very few other sites. The internet landscape has changed massively from the 90s and 2000s.

          • BlueTemplar 4 hours ago

            So, about $0.01 per person per year ?

            We are talking about an (almost) worldwide archive after all.

  • kleiba 6 hours ago

    People with solid info sec knowledge: this is a good opportunity to offer your expertise pro-bono for a good cause!

    • kyleyeats 5 hours ago

      They're buried in these offers right now.

      • op00to 4 hours ago

        I wonder how many offers are legitimate.

        • TZubiri 4 hours ago

          An org amidst an attack might not be the most open to giving credentials and access to strangers.

          • knowitnone 2 hours ago

            why not? it's already been given away

    • 5jh5j56 2 hours ago

      At this point they should consider a rewirte from scratch. I bet they are running a tech stack from 1992.

  • sirolimus 8 hours ago

    It’s incredibly sad to see threat actors attack something as altruistic as an internet library. Truly demoralizing to see such degeneracy.

    • sim7c00 5 hours ago

      anything with tons of traffic going to it is a target. it has nothing to do with what the entity does, more with what potential reach it has. criminal behaviour is what it is. people pulling loads of visitors need to properly secure their shit, to prevent their their customers becoming their victims.

    • userbinator 6 hours ago

      When there are plenty of people who are steeped in the dogma of Imaginary Property, and whose lives depend on it, it's not too surprising.

      • boplicity an hour ago

        FYI: "Money" is imaginary property. Not sure you want to call people supporting "imaginary property" dogmatic. It's what our society is built on.

    • croes 6 hours ago

      Seems like the actor did it only for the street credit and the second breach is only a reminder that IA didn’t properly fixed it after the first breach.

      Could be worse.

    • A4ET8a8uTh0 5 hours ago

      Not defending attacker, because I see IA as common good. That said one of the messages from this particular instance reads almost as if they were trying to help by pointing out issues that IA clearly missed:

      "Whether you were trying to ask a general question, or requesting the removal of your site from the Wayback Machine your data is now in the hands of some random guy. If not me, it'd be someone else."

      I am starting to wonder if the chorus of 'maybe one org should not be responsible for all this; it is genuinely too important' has a point.

    • luckylion 5 hours ago

      A different framing is: be grateful that it's these types of people breaching IA and being vocal about it & asking IA to fix their systems. Others might just nuke them, or subtly alter content, or do whatever else bad thing you can think of.

      They're providing a public service by pointing out that a massive organization controlling a lot of PII doesn't care about security at all.

    • codezero 6 hours ago

      There are many state actors that attack targets of opportunity just to cause chaos and asymmetric financial costs.

    • xyst 6 hours ago

      Blame bad leadership.

      • callc 5 hours ago

        Is there a reason to blame the victim, rather than the attackers?

        I’m asking seriously - did IA do shitty things that make them a worthy cause for politically/ideologically motivated hacking?

        • lolinder 5 hours ago

          I imagine they're referring to the fact that the leadership showed extremely bad judgement in deciding to pick a battle with the major publishing companies that everyone knew they would lose before it even began [0].

          I don't think that justifies blaming the victim here, and from what I can see the attacker doesn't seem to be motivated by anything other than funsies, but I absolutely lost a lot of faith in their leadership when they pulled the NEL nonsense. The IA is too valuable for them to act like a young activist org—there's too much for us to lose at this point. They need to hold the ground they've won and leave the activism to other organizations.

          [0] https://www.wired.com/story/internet-archive-loses-hachette-...

          • jampekka 3 hours ago

            > there's too much for us to lose at this point

            Feeling entitled?

            • lolinder 5 minutes ago

              "Us" means all of humankind for hopefully many generations to come. It's not about my personal entitlement, it's that the IA serves a vital role for humanity (one which they fought hard to make permissible).

            • IntelMiner 32 minutes ago

              Only if you don't care about history

  • gweinberg 8 hours ago

    Does anyone know who is targeting the Internet Archive, and why? I get the impression the attacks are too sophisticated for it to just be vandal punks.

    • lolinder 5 hours ago

      > I get the impression the attacks are too sophisticated for it to just be vandal punks.

      What gives that impression? Everything I've seen about the attacker's messaging says "vandal punk(s)" to me, and nothing in what I've seen of the IA's systems screams Fort Knox. It wouldn't surprise me if they actually had a pretty lax approach to security on the assumption that there's very little reason to target them.

    • dokyun 4 hours ago

      The group that claimed to be responsible for the first hack was said to be Russian-based, anti-U.S., pro-Palestine, and their reasoning for the attack was because of IA's violation of copyright....

      I think you should draw your own more informed conclusions, but it smells a lot like feds to me.

      • MathMonkeyMan an hour ago

        What do Palestine, Russia, and the U.S. have to do with the Internet Archive? The Internet Archive is a supremely boring target politically.

        • small_scombrus 25 minutes ago

          That's the point they're making. It's such a seeming non-sequitur that people are suspicious and coming up with fun theories.

    • polytely 3 hours ago

      With the amount of comments calling for a leadership change my tinfoilhat theory is that this is a concerted effort to get a leadership change.

    • xyst 6 hours ago

      Is it sophisticated if IA leaves the door wide open? I blame shit leadership.

    • jrm4 5 hours ago

      It strikes me as reasonable to assume (or at least strongly bet on) -- I'm not sure of the right phrase for it -- but like a mercenary type operation on behalf of some larger old media company?

      There's just too much "means, motive and opportunity" there.

  • _fat_santa 10 hours ago

    I don't know what their funding model looks like but if they have some cash I'd say hiring a security team would be on top of the list of things to invest in.

    • brendoelfrendo 10 hours ago

      I believe that, at this point in time at least, IA's funding model consists of sweating profusely while awaiting a colossal legal judgement.

  • myself248 10 hours ago

    I'd like to imagine a world where every lawyer, when their case is helped by a Wayback Machine snapshot of something, flips a few bucks to IA. They could afford a world-class admin team in no time flat.

    • thaumasiotes 9 hours ago

      That's a terrible solution. The Wayback Machine takes down their snapshots at the request of whoever controls the domain. That's not archival.

      If the state of a webpage in the past matters to you, you need a record that won't cease to exist when your opposition asks it to. This is the concept behind perma.cc.

      • db48x 9 hours ago

        No, they don’t delete the archived content. When the domain’s robots.txt file bans spidering, then the Wayback Machine _hides_ the content archived at that domain. It is still stored and maintained, but it isn’t distributed via the website. The content will be unhidden if the robots.txt file stops banning spiders, or if an appropriate request is made.

        • speerer 8 hours ago

          In some cases they do appear to delete, on request.

          edit: "Other types of removal requests may also be sent to info@archive.org. Please provide as clear an explanation as possible as to what you are requesting be removed for us to better understand your reason for making the request.", https://help.archive.org/help/how-do-i-request-to-remove-som...

          • db48x 7 hours ago

            Nope. Nothing is deleted, just hidden.

            • rascul 7 hours ago

              How do you know?

              • db48x 7 hours ago

                I worked there for a short while.

                • bombcar 6 hours ago

                  So if the Internet Archive accidentally archived child porn, they wouldn’t delete it?

                  I suspect they DO delete some things.

                  • db48x an hour ago

                    Don't be asinine; of course there are exceptions. But the general rule is that nothing is deleted. Even if you have a fancy expensive lawyer send them a C&D letter asking them to delete something or else, they’ll just hide it. You can’t tell the difference from the outside. In fact there are monitoring alarms that are triggered if something _is_ deleted.

        • Raed667 8 hours ago

          They do delete entire domains from the archive upon request & proof of ownership.

          • db48x 7 hours ago

            Again, no they don’t. They just hide them.

      • myself248 9 hours ago

        Ooo, excellent. Yes, hiding items is imperfect, but I understood that it was legally required or something. (IANAL and IDFK, TBH) I wonder how perma.cc gets around that.

        • berdario 8 hours ago

          I'm afraid that it just hasn't been tested in court yet.

          I haven't read this paper yet, but...

          https://www.tesble.com/10.1080/0270319x.2021.1886785

          from the abstract:

          > The article concludes that Perma.cc's archival use is neither firmly grounded in existing fair use nor library exemptions; that Perma.cc, its "registrar" library, institutional affiliates, and its contributors have some (at least theoretical) exposure to risk

          It seems that the article is about copyright, but of course there are several other reasons that might justify takedown of content stored on perma.cc:

          - Right to be forgotten... perma.cc might be able to ignore it, but could this lead to perma.cc being blocked by european ISPs

          - ITAR stuff

          - content published by entities recognized by $GOVERNMENT as terrorist organizations

          - revenge porn

          - CSAM

        • immibis 9 hours ago

          Most likely by breaking the law.

      • speerer 8 hours ago

        That's correct, but only for present evidence - what about the past evidence, that you didn't know you needed until it was too late? IA is broad enough to cover the past five times out of ten.

  • notmysql_ 8 hours ago

    I sent them a resume almost a year ago, and got nothing back in response until yesterday. Looks like they are going through their backlog right now to find more hands.

    • TZubiri 7 hours ago

      Interesting, for a security position?

      • notmysql_ 6 hours ago

        It was a while ago, I think it was for their general position option, though I did talk about sec experience in it

  • udev4096 10 hours ago

    Is it the same email spoofing attack vector of zendesk which was disclosed last week?

    • steffanA 10 hours ago

      Article says API token was stolen in original breach.

  • RcouF1uZ4gsC 3 hours ago

    The Library of Congress should be archiving the Internet and it should have the budget required to do so.

    This is in line with its mission as the "Library of Congress". Being able to have an accurate record of what was on the Internet at a specific point in time would be helpful when discussing legislation or potential regulation involving the internet.

  • butz 6 hours ago

    Is there any way IA could be mirrored in read-only mode, while security concerns are addressed?

  • 999900000999 7 hours ago

    Do any organizations have a mirror of this?

    Even if it's not publicly available...

  • wkat4242 10 hours ago

    Ouch. Once can happen, twice in a row...

    • fallingknife 10 hours ago

      Once makes the second time more likely. Shows you are a soft target.

  • anthk 7 hours ago

    The Internet Archive had legal gems such as the Jamendo Album Collection, a huge CC haven. Yes, most of it under NC licenses, but for non-commercial streaming radio with podcasts, these have been invaluable.

    Do you know Nanowar? They began there.

    Also, as commercal music has been deliberately dumbed down for the masses (in paper, not by cheap talking), discovering Jamendo and Magnatune in late 00's has been like crossing a parallel universe.

  • TheFreim 10 hours ago

    > "It's dispiriting to see that even after being made aware of the breach weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets," reads an email from the threat actor.

    This is quite embarrassing. One of the first things you do when breached at this level is to rotate your keys. I seriously hope that they make some systemic changes, it seems that there were a variety of different bad security practices.

    • ghostly_s 8 hours ago

      IA is in bad need of a leadership change. The content of the archive is immensely valuable (largely thanks to volunteers) but the decisions and priorities of the org have been far off base for years.

      • fngjdflmdflg 8 hours ago

        Do you have any examples?

        • wkat4242 6 hours ago

          Putting the organisation at risk by playing chicken with large publishing corporations. Trying to stretch fair use a little too far so they had to go to court.

      • superkuh 7 hours ago

        It's the least worst option. Remember when that happened with Mozilla? Now they're an ad company. Take the bad (some bad mis-steps re:multiple lending during the pandemic, not rotating keys immediately after a hack) with the good (staying true to the human centric mission and not the money flows).

      • ranger_danger 5 hours ago

        The content of the archive is 90% mass piracy and Jason Scott is demonstrably complicit in encouraging users to upload copyrighted content without permission.

        Edit: Downvoting doesn't change the truth.

      • echelon 8 hours ago

        I support archival of films, books, and music, but those items need to be write-only until copyright expires. The purpose of the Internet Archive is to achieve a wide-reaching, comprehensive archival, not provide easy and free read access to commercial works.

        Website caches can be handled differently, but bulk collection of commercial works can't have this same public access treatment. It's crazy to think this wouldn't be a huge liability.

        Battling for copyright changes is valiant, but orthogonal. And the IA by trying to do both puts its main charter--archival--at risk.

        The IA should let some other entity fight for copyright changes.

        I say this as an IA proponent and donor.

        • withinboredom 8 hours ago

          I'd agree with you if you live in a country where you can walk into your local library and read these for "free." For people who live where there may not even be a library, your argument makes no sense except to make the publishers richer. They typically price some of these books at "library prices" so normal people won't be able to afford them, but libraries will.

          • sieabahlpark 7 hours ago

            Copyright is copyright. If you don't like the idea of a publisher owning the rights to content they published doesn't mean you have a right to their content. Let alone worldwide distribution of that content.

            What makes you feel entitled to the content of the publisher before the copyright expires? Do you feel that you deserve access to everything because you've deemed the concept of ownership around book publishing immoral?

            You can't just take a digital copy of a physical book and give it to everyone worldwide. That isn't your choice or decision to make nor is it ethical to ascribe malice to simply retaining distribution rights to content they own.

            "Make publishers richer", it's actually just honoring the concept of ownership...

        • absence5875 4 hours ago

          > but bulk collection of commercial works can't have this same public access treatment

          And it doesn't.

        • giantrobot 7 hours ago

          > I support archival of films, books, and music, but those items need to be write-only until copyright expires.

          Which means no one alive today would ever be able to see them out of copyright. It also requires an unfounded belief that major copyright owning companies won't extend copyright lengths beyond current lengths which are effectively "forever".

    • galleywest200 10 hours ago

      >"It's dispiriting to see that even after being made aware of the breach weeks ago..."

      These people are not dispirited whatsoever, if anything they are half-cocked that these script kiddies found an easy target.

      • chrisrhoden 8 hours ago

        The words came from a message written by the people you are calling script kiddies, rather than being editorializing by bleepingcomputer, as you seem to believe.

        • compootr 8 hours ago

          script kiddie or blackhat hacker is irrelevant. IA has shit security practices, and that's a fact regardless of who figures that out

      • EasyMark 7 hours ago

        I highly doubt they are script kiddies. More than likely they are state actors or mercenaries of state actors attempting to bring down the free transmittal of information between regular folks. IA evidently has not so good security and wikipedia must be doing pretty well I guess? I can’t recall the last time one of these attacks worked on Wiki.

        • luckylion 5 hours ago

          Why would they publicly call them out and lay open the way they breached them if they were "attempting to bring down the free transmittal of information between regular folks"?

          They could have done much worse but they chose not to and instead made it public. Which state actor does that?

      • Aachen 2 hours ago

        Subtitling: half clocked means not fully prepared

    • tgsovlerkhgsel 6 hours ago

      There are many "first things" you need to do if breached, and good luck identifying and doing them all in a timely fashion if you're a small organization, likely heavily relying on volunteers and without a formal security response team...

  • pessimizer 10 hours ago

    The Internet Archive has a management problem. They seem to be more comfortable disrupting libraries than managing an online, publicly accessible database of disputed, disorganized material.

    Despite all of the positive self-talk, I don't know if they realize how important they are, or how easy it would be for them to find good help and advice if their management were transparent and everything was debated in public. That may have protected it to some extent; as a counterexample, Wikipedia has been extremely fragile due to its transparency and accessibility to everyone. With IA being driven by its creator's ideology, maybe that ideology should be formalized and set in stone as bylaws, and the torch passed to people openly debating how IA should be run, its operations, and what it should be taking on.

    I don't mean they should be run by the random set of Confucian-style libertarian aphorisms that is running the credibility of Wikipedia into the ground, but Debian is a good model to follow. Or maybe do better than both?

    • mrweasel 8 hours ago

      > Debian is a good model to follow.

      While I have no idea how Debian is actually funded I'd agree. One issue might be that The Internet Archive actually need to have people on staff, not sure if Debian has that requirement. You're not going to get people to man scanner or VHS players 8 hours a day without pay, at least not at this scale.

      The Internet Archive needs a better funding strategy that asking for money on their own site. People aren't visiting them frequently enough for that to work. They need a fundraising team, and a good one.

      Finding managers are probably even worse. They can't get a normal CEO type person, because they aren't a company and the type of people who apply to or are attracted to running non-profit, server the community, don't be evil organisation are frequently bat-shit crazy.

    • badlibrarian 9 hours ago

      Don't forget the time Brewster tried to run a bank -- Internet Archive Federal Credit Union. Or that the physical archives are stored on an active fault line and unlikely to receive prompt support during an emergency. Or that, when someone told him that archives are often stored in salt mines he replied, "cool, where can I buy one?"

    • kmeisthax 8 hours ago

      > Confucian-style libertarian aphorisms that is running the credibility of Wikipedia

      Can you elaborate? I'm aware of Wikipedia having very particular rules and lots of very territorial editors, but I'm not sure how this runs their credibility into the ground aside from pissing off the far right when they come in with an agenda to push.

    • avazhi 9 hours ago

      https://www.wired.com/story/internet-archive-memory-wayback-...

      I appreciate their ethos and I've used the site many times (and donated!), but clearly it's at the point where Kahle et al just aren't equipped either personally (as a matter of technical expertise) or collectively (they are just a handful of people) to be dealing with what are probably in many cases nation-state attacks. Kahle's attitude towards (and misunderstanding of) copyright law is IMO proof that he shouldn't be running things, because his legal gambles (gambles that a first year law student could have predicted would fail spectacularly) have put IA at long term risk (see: Napster). And this information coming out over the past few weeks about their technical incompetence is arguably worse, because the tech side of things are what he and his team are actually supposed to be good at.

      It's true that Google and Microsoft and others should be propping up the IA financially but that isn't going to solve the IA's lack of technical expertise or its delusional hippie ethos.

  • badlibrarian 10 hours ago

    Restating my love for Internet Archive and my plea to put a grownup in charge of the thing.

    Washington Post: The organization has “industry standard” security systems, Kahle said, but he added that, until this year, the group had largely stayed out of the crosshairs of cybercriminals. Kahle said he’d opted not to prioritize additional investments in cybersecurity out of the Internet Archive’s limited budget of around $20 million to $30 million a year.

    https://archive.ph/XzmN2

    • semicolon_storm 10 hours ago

      In security, industry standard seems to be about the same as military grade: the cheapest possible option that still checks all the boxes for SOC.

      • EasyMark 7 hours ago

        Military grade has different meanings. I’ve worked in the electronics industry a long time and will say with confidence that the pcbs and chips we sent to the military were our best. Higher temperature ranges, much more thorough environmental testing, many more thermal and humidity cycles, lots more vibration testing. However we also sell them for 5-10x our regular prices but in much lower quantities. It’s a failed meme in many instances as the internet uses it though.

      • incahoots 9 hours ago

        Basically, whatever the liability insurance wants for you to be in compliance, than that’s the standard.

      • Spivak 8 hours ago

        Hot take, this is the way it should be. If you want better security then you update the requirements to get your certification.

        Security by its very nature has a problem of knowing when to stop. There's always better security for an ever increasing amount of money and companies don't sign off on budgets of infinity dollars and projects of indefinite length. If you want security at all you have bound the cost and have well-defined stopping points.

        And since 5 security experts in a room will have 10 different opinions on what those stopping points should be— what constitutes "good-enough" they only become meaningful when there's industry wide agreement on them.

        • abadpoli 8 hours ago

          There never will be an adequate industry-wide certification. There is no universal “good enough” or “when to stop” for security. What constitutes “good enough” is entirely dependent on what you are protecting and who you are protecting it from, which changes from system to system and changes from day to day.

          The budget that it takes to protect against a script kiddy is a tiny fraction of the budget it takes to protect from a professional hacker group, which is a fraction of what it takes to protect from nation state-funded trolls. You can correctly decide that your security is “good enough” one day, but all it takes is a single random news story or internet comment to put a target on your back from someone more powerful, and suddenly that “good enough” isn’t good enough anymore.

          The Internet Archive might have been making the correct decision all this time to invest in things that further its mission rather than burning extra money on security, and it seems their security for a long time was “good enough”… until it wasn’t.

        • db48x 8 hours ago

          Yep. And worse, now matter how much you pay for security it is still possible for someone to make a mistake and publish a credential somewhere public.

        • goodpoint 6 hours ago

          > since 5 security experts in a room will have 10 different opinions

          If that happens you need to seriously rethink your hiring process.

        • gjsman-1000 8 hours ago

          This ^

          We can’t all have the latest EPYC processors with the latest bug fixes using Secure Enclaves and homomorphic encryption for processing user data while using remote attestation of code running within multiple layers of virtualization. With, of course, that code also being written in Rust, running on a certified microkernel, and only updatable when at least 4 of 6 programmers, 1 from each continent, unite their signing keys stored on HSMs to sign the next release. All of that code is open source, by the way, and has a ratio of 10 auditors per programmer with 100% code coverage and 0 external dependencies.

          Then watch as a kid fakes a subpoena using a hacked police account and your lawyers, who receive dozens every day, fall for it.

  • alexey-salmin 7 hours ago

    A genuine question to commenters asking to "put a grownup in charge of the thing" and saying that "Kahle shouldn't be running things": he built the thing, why exactly he can't run it the way he sees fit?

    • et-al 7 hours ago

      He is. But at the cost of the greater good.

      Most of us care mainly about the Wayback Machine and archiving webpages; not borrowing books still under copyright and fighting publishers.

    • pvg 6 hours ago

      A good place to direct that question might be in a reply to the person who made that comment.