I am trying to archive lefty reddit in the likely off chance that these subs get nuked. Any more subs to add to the list? Any advice on going about archiving it? What should I do with in bedded links?

  • @ImARabbit
    link
    3
    edit-2
    4 years ago

    Okay so I tried and I can get data from MTC. Some of the images hosted by Reddit are taken down but not all.

    I’d propose injecting all this data into existing subs using a user called archive_bot or data_horder or something. The original author’s name or a unique obfuscation of it could be put at the top of each comment so threads could be readable

    Reddit probably wouldn’t like this and I dunno if they’d take legal action. Their TOS says data can be displayed by third parties but you’re supposed to abide take down requests, though the original author and not Reddit is the “owner” of the data. Maybe those posts should require a log in to view to prevent search engines from finding it.

    Edit: one can’t simply upload all this to Lemmy because all the posts and comments will need to link to each other through foreign keys in the database, and those won’t be the same keys Reddit was using

    • @ClosesniperOP
      link
      14 years ago

      I will look into making some sort of system or website like you suggested, but I will probably need some help doing this cause when I said I was gonna archive everything I didn’t expect it to be this difficult. I am gonna reach out to some people for help, maybe make a forum or something IDK.

      I will make a new post with updates very soon.

      • @ImARabbit
        link
        14 years ago

        I have have MTC archived and a script to do it for any sub

        • @ClosesniperOP
          link
          14 years ago

          damn dude, comments and everything? I already have the subs listed archived, so just in case u get swatted by the feds or some shit we have a backup at least lol.

          • @ImARabbit
            link
            14 years ago

            Yeah comments and 2.5 gigs of images Reddit was hosting just for MTC. That’s a lot of memes haha

            • @ClosesniperOP
              link
              14 years ago

              Jesus dude, ill let you do the backup cause I cant even get comments and can only get 1k-2k posts off of each sub

    • Muad'DibberMA
      link
      14 years ago

      Once you get some sample .json we could use, we could write and test a pretty simple importer.

      • @ImARabbit
        link
        14 years ago

        What’s the best way to do this? Are you on GitHub? I could make a private repo (with the code for archiving and a sample of the json) and share it with you.

        • Muad'DibberMA
          link
          14 years ago

          Ya I’m on github, that’d be a pretty easy way to do it. Also make sure some comments come with it too, that’d probably be the most difficult part to script.