Pls explain

  • Vlyn@lemmy.zip
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    1
    ·
    10 months ago

    Why are humans so bad with drawing hands?

    They are tough, AI isn’t building a logical model of a human when drawing them. It’s more like taking a best guess where pixels should go. So it’s not “thinking”: Alright, drawing a human, human has two hands, each hand has five fingers, the fingers are posed like this, …

    It’s drawing a human, so it roughly throws a human shape on there, human shape roughly has a head, when there is a torso two arms should come out (roughly) and on the end of those two arms is something too, but what that is is complicated and always looks different. It’s all approximation, extremely well done, but in the end the AI is just guessing where to put something.

    If you trained a model on just a single type of hand and finger position it would perfectly replicate it. But every hand is different and each hand has a near unlimited amount of positions it can be in (including each finger). So it’s usually a mess.

    I saw one way to get better results, but that’s pretty much giving the AI beforehand a pose (like a stick figure) so it already knows where things should go. If you just freely generate “Human male, holding hands up” you probably get a mess with 6 fingers out and maybe a third arm going to nowhere in the back.

    • loathesome dongeaterOPA
      link
      fedilink
      English
      arrow-up
      5
      ·
      10 months ago

      Why are humans so bad with drawing hands?

      The rest of your answer makes sense but this rhetorical question is not helpful IMO. There are lots of things that humans are not good at but at which computers excel.

      • Vlyn@lemmy.zip
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        10 months ago

        That’s mostly true, but not fully. Models use human drawn images and photos to learn from. So if you put in millions of drawn images and the hands aren’t perfect in all of them, you might mess up the model too. That’s why negative prompts like “malformed”, “bad quality”, “misformed hands” and so on are popular when playing with image generation.

        • loathesome dongeaterOPA
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          How? Humans are not good at finding the square root of numbers but computers are much better at it. Human limitations are not relevant in cases like this.

          • bobman@unilem.org
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            10 months ago

            We’re not talking about square roots of numbers though, we’re talking about drawing hands.

            This happens to be one of the cases that humans and AI both struggle with, because drawing hands is complicated for both entities.

            • loathesome dongeaterOPA
              link
              fedilink
              English
              arrow-up
              1
              ·
              10 months ago

              Yes but you can train ML models on photographs of hands to bypass that limitation?

              • bobman@unilem.org
                link
                fedilink
                arrow-up
                1
                ·
                10 months ago

                Of course. Just like you can train humans to bypass their limitations.

                The problem is training. There’s nothing intrinsic to AI art that prevents it from making perfect hands. It just takes time, and a lot of data.

  • simple@lemm.ee
    link
    fedilink
    arrow-up
    10
    ·
    10 months ago

    Hands are really complicated, even to draw. Everything else is relatively easy to guess for an AI, usually faces are looking at the camera or looking sideways, but hands have like a thousand different positions and poses. It’s hard for the AI to guess what the hands should look like and where the fingers should be. It doesn’t help that people are historically bad at drawing hands so there’s a lot of garbage in the data.

    • loathesome dongeaterOPA
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      That’s true but I would have thought that the models would be able to “understand” hands because I’m assuming they have seen millions of photographs with hands in them by now.

      • queermunist she/her@lemmy.ml
        link
        fedilink
        arrow-up
        5
        arrow-down
        4
        ·
        edit-2
        10 months ago

        I think it’s helpful to remember that the model doesn’t have a skeleton, its literally skin deep. It doesn’t understand hands, it understands pixels. Without an understanding of the actual structure all the AI can do is guess where the pixels go based on other neighboring pixels.

      • SheeEttin@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Sure, and if they were illustrative of hands, you’d get good hands for output. But they’re random photos from random angles, possibly only showing a few fingers. Or maybe with hands clasped. Or worse, two people holding hands. If you throw all of those into the mix and call them all hands, a mix is what you’re going to get out.

        Look at this picture: https://petapixel.com/assets/uploads/2023/03/SD1131497946_two_hands_clasped_together-copy.jpg

        You can sort of see where it’s coming from. Some parts look like a handshake, some parts look like two people standing side by side holding hands (both with and without fingers interlaced), some parts look like one person’s hands on their knee. It all depends on how you’re constructing the image, and what your input data and labeling is.

        Stable Diffusion works by changing individual pixels until it looks reasonable enough, not looking at the macro scale of the whole image. Other methods, like whatever dalle2 uses, seem to work better.

  • silvercove@lemdro.id
    link
    fedilink
    English
    arrow-up
    3
    ·
    10 months ago

    Probably because it’s a complicated 3D shape. The 2D projection of the hand on the photo can change a lot depending on the camera angle, position of the hand and what the person is doing.

    Also I noticed that AI has difficulty when different features are close to one-another, for example when someone crosses legs or holds and object. Maybe the AI is competent to draw the objects in isolation, but their combination is much more difficult. This is often the case with hands,