• 7 Posts
  • 37 Comments
Joined 4 years ago
cake
Cake day: June 30th, 2020

help-circle

  • Once I had to use the internet without and ad blocker ( shiver ). It was horrible. I still have nightmares.

    Joking aside. I couldn’t believe how crammed full and chaotic sites were without an ad blocker. I have no evidence to support this other than my experience but I think , for me , ad blockers are good for my mental health. Being constantly exposed to all those messages trying to exploit insecurities can’t be good for people.

    Anyways ad blockers are the best.


  • This seems very obvious to me , not that it isn’t worth highlighting. Particularly in a world with open models and weights , which we should desperately want. The don’t worry water marks will be a thing just seems like an attempt have some response that dampers concerns. I don’t imagine most people working in the AI space actually think this would work. I could be wrong.


  • Yeah I 100% understand and to a large extent agree with this. I think money should be involved , creators should get paid. I don’t think peertube has become “the answer” yet and there is some combination of market level event and technology/feature set that needs to be in place to create enough moment for people to move off YouTube. It will happen eventually ( I think ) but what exist today isn’t enough of a pull to overcome the momentum YouTube has but that doesn’t mean that “we” should give up.






  • I’ve used StandardNotes for years. They are great, very privacy friendly and lots of good features. I’ve also used Obsidian like others have mentioned but I didn’t use 95% of the features on either standard notes or Obsidian – now days I just use a general markdown files and store them in a git repo – low complexity and I like the simplicity of it. 100% recommend.



  • Hey, I am a machine learning engineer that works with people data. Generally you measure bias in the training data, the validation sets, and the outcomes ( in an ongoing fashion - AIF 360 is a common library and approach ). There are lots of ways to measure bias and or fairness. Just checking if a feature was used isn’t considered “enough” by any standards or practitioner. There are also ways to detect and mitigate some of the proxy relationships you’re pointing to. That being said, I am 100% skeptical that any hiring algorithm isn’t going to be extremely bias. A lot of big companies have tried and quit because despite using all the right steps the models were still bias https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. Also many of the metrics used to report fairness have some deep flaws ( disprate impact ).

    All that being said the current state is that there are no requirements for reporting so vendors don’t do the minimum 90% of the time because if they did it would cost a lot more and get in the way of the “AI will solve all your problems with no effort” narrative they want to put forward so I am happy to see any regulation coming into place even if it won’t be perfect.




  • I use kagi. I think it depends on your level of concern , as it does with most things. Kagi has a pretty nicely written privacy policy. They do require an account but I signed up with a masked email and cc. For my use I find their privacy policy enough given the other measures I take but the main reason I like kagi is zero ads or prioritized posting. Experiencing search with out ads is a pretty awesome exp in my opinion. There are other ways to get free search with ads stripped out but this “feel” fundamentally different from a service purpose built to be ad free and private. I am happy to pay for ad free platforms vs using platforms that are trying to do privacy preserving ads but this is more of a personal stance and preference. I know your question was more about privacy than ads but I find the two closely linked. I’ve attached a summary of their privacy policy below:

    • Searches are anonymous and private to you. Kagi does not see what you are searching at all.
    • We do not log or store your IP address. Your IP address is used only temporarily when enriching location/maps searches, and is not shared with any other party.
    • We only store cookies needed for site functionality. We do not use any web browser analytics or other frontend telemetry.
    • We do not display any ads, or have any first-party or third-party tracking in service of ads.
    • We do not share customer data with third parties, except as needed to perform explicitly accessed services. In those cases, we will share the minimum amount of data needed to provide the service, and will do so in an anonymous way.
    • We collect only the data needed to provide and protect the service.
    • We proxy all images to prevent tracking from third parties.
    • We use HTTPS encryption everywhere. All passwords are hashed and salted.

    https://kagi.com/privacy






  • I think this is probably true for most providers. They could add logs if they were legally required but don’t actively keep them. I think there is way too much stock put in the ‘we don’t log’ comments that are common amongst privacy tools. Most VPN providers can log if they have to and often do log some data for service abuse and load monitoring but quibble over the definition of what ‘we don’t log’ means. I used to work for a VPN provider where we kept statements in our privacy policies about some logging and users ripped us apart despite these comments being truthful + other providers being dishonest ( or at least confusing ); but since so many providers provided false confidence via slamming all over their site that they don’t log the user base buys into these statements as 100% true ( and unchangeable ) and providers that try and provide a realistic view of what can happen get slammed. I am happy to see that proton put the statement up. I would have preferred they had statements up already but just because another provider says they don’t log I wouldn’t trust these statements. For me, I am not too worried if the provider can log some data like ip when they receive a non-avoidable court order ( https://en.wikipedia.org/wiki/United_States_Foreign_Intelligence_Surveillance_Court ) as I generally expect this to be true for all services and my threat model isn’t to avoid three letter agencies. If your threat model requires avoiding three letter agencies then trusting almost any service provider is going to be difficult. Obviously you should be using tor to connect to anything but you would have to assume almost everything with a server is either compromised or can be given certain court orders. Using services like briar seem like your best bet ( https://briarproject.org/ ).


  • I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.

    I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.

    Are you happy with your solution ? Can you share a bit more about your pipeline?