Open-Sourcing SEC Edgar on Hugging Face

(twitter.com)

5 points | by EnricoShippole 7 hours ago ago

5 comments

  • sammysidhu 41 minutes ago

    Amazing work leveraging Daft for this!

  • goodmythical 6 hours ago
  • EnricoShippole 7 hours ago

    Given the increasingly closed-source nature of the U.S. AI ecosystem, it is now more important than ever to push for the proliferation of open model and dataset releases. Datamule, TeraflopAI, and Daft collaborated to release 43 Billion Tokens of SEC EDGAR data.

  • jgfriedman1999 7 hours ago

    Neat! Surprised at how cheap it was.

    • jaychia 5 hours ago

      Very cool that this kind of work can now be performed at this kind of a price-point. 24 hours for 8M filings on just 12 cores :)

      Excited for unstructured/multimodal data processing to become increasingly commoditized and abstracted away so that more such datasets can be built