S3 as a Git remote and LFS server

(github.com)

197 points | by kbumsik 9 months ago ago

53 comments

  • mdaniel 9 months ago

    All this mocking when moto exists is just :-( https://github.com/awslabs/git-remote-s3/blob/v0.1.19/test/r...

    Actually, moto is just one bandaid for that problem - there are SO MANY s3 storage implementations, including the pre-license-switch Apache 2 version of minio (one need not use a bleeding edge for something as relatively stable as the S3 Api)

    • notpushkin 9 months ago

      > there are SO MANY s3 storage implementations

      I suppose given this is under the AWS Labs org, they don’t really care about non-AWS S3 implementations.

      • mdaniel 9 months ago

        Well, I look forward to their `docker run awslabs/the-real-s3:latest` implementation then. Until such time, monkeypatching api calls to always give the exact answer the consumer is looking for is damn cheating

        • notpushkin 9 months ago

          Agreed, haha. Well, I think it should work with Minio & co. just as well, but be prepared to have your issues closed as unsupported. (Pesonally, I might give it a go with Backblaze B2 just to play around, yeah)

        • chrsig 9 months ago

          it wouldn't be unprecedented. dynamodb-local exists.

    • SahAssar 9 months ago

      Do you mean boto (the python SDK for AWS)?

      EDIT: They probably do not, I'm guessing they mean https://docs.getmoto.org/en/latest/index.html ?

      • flakes 9 months ago

        moto server for testing S3 is pretty great. It’s about the same experience as using a minio container to run integration tests against.

        I use this, and testing.postgresql for unit testing my api servers with barely any mocks used at all.

      • mdaniel 9 months ago

        Happy 10,000th Day to you :-D Yes, moto and its friend localstack are just fantastic for being able to play with AWS without spending money, or to reproduce kabooms that only happen once a month with the real API

        I believe moto has an "embedded" version such that one need not even have in listen on a network port, but I find it much, much less mental gymnastics to just supersede the "endpoint" address in the actual AWS SDKs to point to 127.0.0.1:4566 and off to the races. The AWS SDKs are even so friendly as to not mandate TLS or have allowlists of endpoint addresses, unlike their misguided Azure colleagues

    • remram 9 months ago

      Unfortunately there's been a few vulnerability since that old Minio release. For something you expose to users, it's a problem.

      • mdaniel 9 months ago

        I would hope my mentioning moto made it clear my comment was about having an S3 implementation for testing. Presumably one should not expose moto to users, either

  • Scribbd 9 months ago

    This is something I was trying to implement myself. I am surprised it can be done with just an s3 bucket. I was messing with API Gateways, Lambda functions and DynamoDB tables to support the s3 bucket. It didn't occur to me to implement it client side. I might have stuck a bit too much to the lfs test server implementation. https://github.com/git-lfs/lfs-test-server

  • CGamesPlay 9 months ago

    If you are interested in using S3 as a git remote but are concerned with privacy, I built a tool a while ago to use S3 as an untrusted git remote using Restic. https://github.com/CGamesPlay/git-remote-restic

  • zmmmmm 9 months ago

    Just remember, the mininum billing increment for file size is 128KB in real AWS S3. So your Git repo may be a lot more expensive than you would think if you have a giant source tree full of small files.

  • doctorpangloss 9 months ago

    https://alanedwardes.com/blog/posts/serverless-git-lfs-for-g...

    I’ve used this guy’s CloudFormation template since forever for LFS on S3.

    GitHub has to lower its egregious LFS pricing.

  • x3n0ph3n3 9 months ago

    Wow, AWS really wants to get rid of CodeCommit.

  • Evidlo 9 months ago

    For the LFS part there is also dvc which works better than git-lfs and natively supports S3.

    • matrss 9 months ago

      There is also git-annex, which supports S3 as well as a bunch of other storage backends (and it is very easy to implement your own, it just has to loosely resemble a key-value store). Git-annex can use any of its special remotes as git remotes, like what the presented tool does for just S3.

    • kernelsanderz 9 months ago

      Also worth checking out https://github.com/jasonwhite/rudolfs

      Been using it to store datasets via lfs. Written in rust and has been very reliable.

    • bagavi 9 months ago

      Dvc is great tool!

      • lenova 9 months ago

        I haven't heard of dvc, so I had to google it, which took me to: https://dvc.org/

        But I'm still confused as to what is dvc is after a cursory glance at their homepage.

        • chatmasta 9 months ago

          It was on the front page contemporaneously with this comment that recommended it, so you know it was an unbiased recommendation. :)

  • milkey_mouse 9 months ago

    You can also do this with Cloudflare Workers for fewer setup steps/moving parts:

    https://github.com/milkey-mouse/git-lfs-s3-proxy

  • philsnow 9 months ago

    I'm surprised they just punt on concurrent updates [0] instead of locking with something like dynamodb, like terraform does.

    [0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

  • kernelsanderz 9 months ago

    I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.

  • fortran77 9 months ago

    Amazon has deprecated Amazon Code Commit, so this may be an interesting alternative.

    • adobrawy 9 months ago

      In what use case it can be interesting alternativd?

      Limited access control (e.g. CI pass required), so not very useful for end users. For machine-to-machine it's an additional layer of abstraction when a regular tarball is fine.

  • tonymet 9 months ago

    how does it handle incremental changes? If it’s writing your entire repo on a loop, I could see why AWS would promote it.

  • WhyNotHugo 9 months ago

    git-annex also has native support for s3.

    • matrss 9 months ago

      I think this is more about storing the entire repository on s3, not just large files as git-lfs and git-annex are usually concerned with. But coincidentally, git-annex somewhat recently got the feature to use any of its special remotes as normal git remotes (https://git-annex.branchable.com/git-remote-annex/), including s3, webdav, anything that rclone supports, and a few more.

  • xena 9 months ago

    How do you install this? Homebrew broke global pip install. Is there a homebrew package or something?

    • mdaniel 9 months ago

      FWIW, their helpers make things pretty cheap to create new Formula by yourself

          $ brew create --python --set-license Apache-2 https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
          Formula name [git-remote-s3]:
          ==> Downloading https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
          ==> Downloading from https://codeload.github.com/awslabs/git-remote-s3/tar.gz/refs/tags/v0.1.19
          ##O=-#   #
          Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-remote-s3-0.1.19.tar.gz'.
          No checksum was provided.
          For your reference, the checksum is:
            sha256 "3faa1f9534c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
          ==> Retrieving PyPI dependencies for "https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz"...
          ==> Retrieving PyPI dependencies for excluded ""...
          ==> Getting PyPI info for "boto3==1.35.44"
          ==> Getting PyPI info for "botocore==1.35.44"
          ==> Excluding "git-remote-s3==0.1.19"
          ==> Getting PyPI info for "jmespath==1.0.1"
          ==> Getting PyPI info for "python-dateutil==2.9.0.post0"
          ==> Getting PyPI info for "s3transfer==0.10.3"
          ==> Getting PyPI info for "six==1.16.0"
          ==> Getting PyPI info for "urllib3==2.2.3"
          ==> Updating resource blocks
          Please run the following command before submitting:
            HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
          Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/g/git-remote-s3.rb
      
      They also support building from git directly, if you want to track non-tagged releases (see the "--head" option to create)
  • mattxxx 9 months ago

    This seems wrong, since you can't push transactionally + consistently in S3.

    They address this directly in their section on concurrent writes: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

    And in their design: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

    But it seems like this is just the wrong tool for the job (hosting git repos).

  • Havoc 9 months ago

    Does this work with other s3 implementations like minio?

  • yachty66 9 months ago

    [dead]