Hi there, looking for an alternative to news.google.com that just simply isn’t a Google product. I know it’s not open source per say, but just curious.
Hi there, looking for an alternative to news.google.com that just simply isn’t a Google product. I know it’s not open source per say, but just curious.
On my phone i use feeder ( android, not sure if it is on ios ). On my computer I use newspaper3k ( https://newspaper.readthedocs.io/en/latest/ ) – I built out some additional summary tools and nltk tools that allow me to find article on similar topic from sources with different bias + some named entity extraction that easily joins into dbpedia. I intend to contribute the additional features I’ve added but haven’t done so yet as the code is rough.
I’ve been thinking about using nlp to deal with my feeds.
Are you happy with your solution ? Can you share a bit more about your pipeline?
I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.
I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.
Wow that’s quite developed.
So you consume content in a jupyter notebook? Or you’re interfacing this with a RSS reader?
From what I read the next step is to run it in a real database.
I consume analytics and identify topics I am interested in via jupyter sometimes i just use ipython if I don’t want to leave the terminal – I need to build more of a frontend but I’ve not got there yet. I mostly read the articles in the terminal. And yup my plan is to find a good db but I am not sure what to use yet.
You could probably repackages your upgraded feed into a RSS format that you serve locally. But that can be more hassle than it may worth.
Thanks for the info it encouraged me to try that sometime :)
deleted by creator