• rufus@discuss.tchncs.de
    link
    fedilink
    arrow-up
    17
    ·
    edit-2
    7 months ago

    Can we get a bit more info? Does it run locally? What specs does it need? Which technology does it use, something open-ended like Whisper? Or something faster with a prefefined set of sentences like VOSK? Which TTS engine does it use? Does it do other languages than just English?

  • Daniel Quinn@lemmy.ca
    link
    fedilink
    English
    arrow-up
    16
    ·
    7 months ago

    Nifty! I wrote something similar a couple years ago using Vosk for the stt side. My project went a little further though, automating navigating the programs you start. So you could say: “play the witcher” and it’d check if The Witcher was available in a local Kodi instance, and if not, then figure out which streaming service was running it and launch the page for it. It’d also let you run arbitrary commands and user plugins too!

    I ran into two big problems though that more-or-less killed my enthusiasm for developing on it: (1) some of the functionality relied on pyautogui, but with the Linux desktop’s transition to Wayland, some of the functionality I relied on was disappearing. (2) I wanted to package it for Flatpak, and it turns out that Flatpak doesn’t play well with Python. I was also trying to support both arm64 and amd64 which it turns out is also really hard (omg the pain of doing this for the Pi).

    Anyway, maybe the project will serve as some inspiration.

    • suoko@feddit.itOP
      link
      fedilink
      arrow-up
      2
      ·
      7 months ago

      Some years ago I was able to configure Mycroft and its plasma widget and was working very well. But then all was lost unfortunately. It should have become the Kvoice control but it didn’t

      • Daniel Quinn@lemmy.ca
        link
        fedilink
        English
        arrow-up
        5
        ·
        7 months ago

        Don’t get me started with Mycroft. I bought the 1st gen device and invested a year of my life writing the first incarnation of Majel built on top of it. When it was ready to share I announced it in their internal developers group and was attacked repeatedly for using the AGPL instead of a licence that’d let them steal and privatise it. Here I was offering a year’s worth of free labour (and publicity, the project exploded on Reddit), and all they could say was: “use the MIT license so we don’t have to contribute anything”.

        I’m still bitter.

          • Daniel Quinn@lemmy.ca
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            7 months ago

            I’m not sure. https://mycroft.ai/ appears to be gone, redirected to https://community.openconversational.ai/. Since the Mycroft devices depended on a central server for configuration (you pushed your config to their website which in turn relayed environment variables to your code), my guess is that the project is dead, but like all good Free software, still out there.

    • suoko@feddit.itOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      3
      ·
      7 months ago

      I’d go for appimage, it’s spreading more than flatpaks or snaps.

  • geoma@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    7 months ago

    But why a proprietary AI chat like chatGPT and not an open one like the ones on huggingface.co/chat (mixtral, gemma, llama, etc) Each time you query something on chatgpt you help strenghtening it and giving more power to a private company.

  • acockworkorange@mander.xyz
    link
    fedilink
    arrow-up
    6
    ·
    7 months ago

    Is this something that can be accelerated with a TPU module? I’d love to self host a server with this stuff and have my family use from their phones.

  • dohpaz42@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 months ago

    I was just looking for something like this yesterday. Thank you!

    Edit: would this work on a raspberry pi?