Hi there, looking for an alternative to news.google.com that just simply isn’t a Google product. I know it’s not open source per say, but just curious.

  • ProfessorYakkington@lemmy.ml
    link
    fedilink
    arrow-up
    6
    ·
    3 years ago

    On my phone i use feeder ( android, not sure if it is on ios ). On my computer I use newspaper3k ( https://newspaper.readthedocs.io/en/latest/ ) – I built out some additional summary tools and nltk tools that allow me to find article on similar topic from sources with different bias + some named entity extraction that easily joins into dbpedia. I intend to contribute the additional features I’ve added but haven’t done so yet as the code is rough.

    • ree@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      3 years ago

      I’ve been thinking about using nlp to deal with my feeds.

      Are you happy with your solution ? Can you share a bit more about your pipeline?

      • ProfessorYakkington@lemmy.ml
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        3 years ago

        I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.

        I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.

        Are you happy with your solution ? Can you share a bit more about your pipeline?

        • ree@lemmy.ml
          link
          fedilink
          arrow-up
          3
          ·
          3 years ago

          Wow that’s quite developed.

          So you consume content in a jupyter notebook? Or you’re interfacing this with a RSS reader?

          From what I read the next step is to run it in a real database.

          • ProfessorYakkington@lemmy.ml
            link
            fedilink
            arrow-up
            2
            ·
            3 years ago

            I consume analytics and identify topics I am interested in via jupyter sometimes i just use ipython if I don’t want to leave the terminal – I need to build more of a frontend but I’ve not got there yet. I mostly read the articles in the terminal. And yup my plan is to find a good db but I am not sure what to use yet.

            • ree@lemmy.ml
              link
              fedilink
              arrow-up
              3
              ·
              3 years ago

              You could probably repackages your upgraded feed into a RSS format that you serve locally. But that can be more hassle than it may worth.

              Thanks for the info it encouraged me to try that sometime :)