For fun, I spinned up Crush and gave it the link to a subreddit. I told it it was a data analyst and we want to do a profile on this community based on ‘top’ and ‘hot’ posts and comments.

What you need: just vanilla crush and the LLM you connect to. It can connect to the Internet out of the box if you provide a link and will try different scraping methods.

It fetched a few of them and then produced this vizualisation with python, which also automatically gets saved as a png for further reference. Python is great for data analysis and a lot of people use it just for that so I did ask the agent to use python from the get go.

The data inside the image however is all deepseek. It took some time to fetch the content but it produced this all by itself, which means because it was an LLM, it could analyse the actual content of the post to do qualitative categories such as determining if a post was a meme or a personal story, or if a comment was supportive, defensive etc.

I also had it produce a file with a report of its findings. Once it has the data analysed, you can then further query the LLM (even switching to a better model mid session) and ask questions: what is the best post to make on this community? (it’s a meme posted on Tuesday or Thursday between 6 and 8pm for upvotes, a personal story for comments) things like that.

Note that deepseek doesn’t have vision, it doesn’t read the image directly but instead the output from the python code.

One thing I didn’t ask was the left right distribution. There’s a lot of lefties on that subreddit so it would be interesting to know.

Limitations: it didn’t scrape a lot of posts and it takes a while to do it. You also need to guide it into what sort of posts you want to scrape. I think top all time above say 100 upvotes and 20 top of today would have been better.

Other uses: from there you can basically data analyse anything anywhere as long as it can connect to it.

It’s very scary when the graph suddenly appears on your screen and you realize, a machine made all of this without my involvement… Lol

  • -6-6-6-
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    sample size is only about 20-30 posts

    Are there A.I programs currently capable of doing this? If not, a cash-sign just appeared in my head.

    • CriticalResist8OPMA
      link
      fedilink
      arrow-up
      3
      ·
      4 months ago

      Crush (and probably most agents) handle web browsing natively; I just had to give it the link to the reddit. It also finds different ways to scrape if one of them fails. I think it stopped at 20 posts because I originally told it to only take 10 posts or so and then to double it. It was a bit awkward so you can probably add a lot more posts, but it also was a bit opaque and I wasn’t entirely sure what it was scraping exactly, for example. With more prompting (such as storing the posts and comments in a database first) it will probably work much better.

      I didn’t want to make it scrape 100 posts just because of costs lol, though it probably wouldn’t have cost too much on the API either way…

      • -6-6-6-
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        What would be necessary to run something like this? I’m completely unfamiliar. However, I’m always looking for ways to scam rich people.