Show HN: OSS AI agent that indexes and searches the Epstein files

(epstein.trynia.ai)

96 points | by jellyotsiro 9 hours ago ago

33 comments

axegon_ 2 minutes ago

As many others pointed out, the released files are nearly nothing compared to the full dataset. Personally I've been fiddling a lot with OSINT and analytics over the publicly available Reddit data(a considerable amount of my spare time over the last year) and the one thing I can say is that LLMs are under-performing(huge understatement) - they are borderline useless compared to traditional ML techniques. But as far as LLMs go, the best performers are the open source uncensored models(the most uncensored and unhinged), while the worst performers are the proprietary and paid models, especially over the last 2-3 months: they have been nerfed into oblivion - to the extent where simple prompts like "who is eligible to vote in US presidential elections" is considered a controversial question. So in the unlikely event that the full files are released, I personally would look at the traditional NLP techniques long before investing any time into LLMs.

wartywhoa23 2 hours ago

The question is not how to analyze that, it's how to prosecute those who are above the law.

Imustaskforhelp 33 minutes ago

Please create a way to share conversations. I think that can be really relevant here

I am not a huge fan of AI but I allow this use case. This is really good in my opinion

Allowing the ability to share convo's, I hope you can also make those convo's be able to archived in web.archive.org/wayback machine

So I am thinking it instead of having some random UUID, it can have something like https://duckduckgo.com/?q=hello+test (the query parameter for hello test)

Maybe its me but archive can show all the links archived by it of a particular domain, so if many people asks queries and archives it, you almost get a database of good queries and answers. Archive features are severely underrated in many cases

Good luck for your project!

andy_ppp 4 hours ago

I keep thinking that the lack of children’s faces in the blacked out rectangles make the files much less shocking. I wonder if AI could put back fake images to make clearer to people how sick all this is.

[-]

13hunteo 21 minutes ago

I understand the sentiment, but I'm always very concerned when it comes to AI generating pictures of children.

yuppiepuppie an hour ago

When first reading OSS, I thought this was going to be an Office of Strategic Services AI [0] agent :)

[0] https://en.wikipedia.org/wiki/Office_of_Strategic_Services

iowemoretohim 6 hours ago

Those are going to be some spicy hallucinations.

wutsthat4 7 hours ago

And what did you learn?

[-]

jellyotsiro 6 hours ago

Trump famously told New York Magazine in 2002: "I've known Jeff for 15 years. Terrific guy. He's a lot of fun to be with. It is even said that he likes beautiful women as much as I do, and many of them are on the younger side."

Trump and Epstein were social acquaintances in Palm Beach and New York circles during the 1990s-early 2000s. They socialized together at Mar-a-Lago and other venues

[-]

TowerTall 6 hours ago

Interesting. It is my impression that almost everyone globally already knew this. What else did you learn?

[-]

jellyotsiro 6 hours ago

ill take like 1 hour in the evening to dive deeper, i was never familiar with epstein stuff until i built the agent to simplify things for me.

ishtanbul 6 hours ago

This is one of the most widey quoted phrases by trump on the topic of epstein

subzero06 4 hours ago

In 2024, Trump used Epstein's former private jet for campaign appearances

sschueller 3 hours ago

Is it able to handle a much larger dataset? Only a tiny fraction of data has been release from what is looks like.

nubg 6 hours ago

Does this work with vector embeddings?

[-]

jellyotsiro 6 hours ago

it uses semantic search so yes

thecopy 2 hours ago

Reminder that only 1-2% of the files have been released.

[-]

Terr_ an hour ago

Yep: Breaking his campaign promises, in violation of the deadlines imposed by US Federal law, and with unlawful levels of redaction.

tehjoker 7 hours ago

This is a good idea. One thing I never understand about these kinds of projects though: why are the standard questions provided to the user as prompts never cached?

[-]

jampekka 3 hours ago

Outputs are usually generated with random sampling, so the same prompt may get different outputs.

jellyotsiro 6 hours ago

oh forgot about it, thanks. just a funny project i build in couple hours so didnt really sweat haha

[-]

tehjoker 6 hours ago

This agent is really interesting! Learning a lot. Thanks!

dfxm12 6 hours ago

can search the entire Epstein files

It's worth noting that only about 1% of the files have been released, according to the DOJ.

Of the released files, many have redactions.

[-]

Terr_ 44 minutes ago

Yep, they failed to meet the deadlines required by law, and it's not just any redactions either, but unlawful redactions.

King-Aaron 3 hours ago

If the Lake Michigan thing is just in the first 1%, then whatever's in the other 99% is going to be absolutely disgusting.

[-]

Terr_ 43 minutes ago

I would expect a large portion of the remaining records to be internal emails about memos about the process of building a case around evidence, rather than the root evidence itself.

Not that that would excuse the administration's unlawful behavior so far, or indicate the unreleased 99% can't have some big bombshells.

Tom1380 2 hours ago

I searched it with the tool but nothing came up about Lake Michigan. What happened?

[-]

King-Aaron 2 hours ago

https://www.justice.gov/epstein/files/DataSet%208/EFTA000250...

"He participated regularly in paying money to force me to ___ with him and he was present when my uncle murdered my newborn child and disposed of the body in Lake Michigan. "

The uncle is allegedly referring to Trump

jellyotsiro 6 hours ago

sorry all publicly available files *

p0w3n3d 2 hours ago

No need to use Claude Sonnet here. Here's much cheaper solution in Python3:

  import sys
  import random
  import re
  
  print('> ',end='', flush=True)
  input_text = sys.stdin.readline().strip()
  
  text = re.sub(r'[^a-zA-Z ]', '', input_text)
  words = text.split()
  random.shuffle(words)
  
  mixed_words = []
  for word in words:
    n = random.randint(10, 500)
    mixed_words.append(word + ' ' + '' * n)
  
  print(' '.join(mixed_words))

[-]

dmos62 2 hours ago

Ah, yes. Post is an LLM-something project: top comment is a general critique of LLMs. Waiting for this to get old. Meanwhile, at least you get points for being funny.

sebastiennight 2 hours ago

    > + '' * n

This looks like what you'd get from using text-davinci-003 as the model in your AI-assisted IDE

[-]

flexagoon an hour ago

I think it looks like what you get by writing code and making a typo.