• theluddite@lemmy.ml
    link
    fedilink
    English
    arrow-up
    59
    ·
    11 months ago

    The real problem with LLM coding, in my opinion, is something much more fundamental than whether it can code correctly or not. One of the biggest problems coding faces right now is code bloat. In my 15 years writing code, I write so much less code now than when I started, and spend so much more time bolting together existing libraries, dealing with CI/CD bullshit, and all the other hair that software projects has started to grow.

    The amount of code is exploding. Nowadays, every website uses ReactJS. Every single tiny website loads god knows how many libraries. Just the other day, I forked and built an open source project that had a simple web front end (a list view, some forms – basic shit), and after building it, npm informed me that it had over a dozen critical vulnerabilities, and dozens more of high severity. I think the total was something like 70?

    All code now has to be written at least once. With ChatGPT, it doesn’t even need to be written once! We can generate arbitrary amounts of code all the time whenever we want! We’re going to have so much fucking code, and we have absolutely no idea how to deal with that.

    • space_comrade [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      12
      ·
      11 months ago

      I don’t think it’s gonna go that way. In my experience the bigger the chunk of code you make it generate the more wrong it’s gonna be, not just because it’s a larger chunk of code, it’s gonna be exponentially more wrong.

      It’s only good for generating small chunks of code at a time.

      • FunkyStuff [he/him]@hexbear.net
        link
        fedilink
        English
        arrow-up
        7
        ·
        11 months ago

        It won’t be long (maybe 3 years max) before industry adopts some technique for automatically prompting a LLM to generate code to fulfill a certain requirement, then iteratively improve it using test data to get it to pass all test cases. And I’m pretty sure there already are ways to get LLM’s to generate test cases. So this could go nightmarishly wrong very very fast if industry adopts that technology and starts integrating hundreds of unnecessary libraries or pieces of code that the AI just learned to “spam” everywhere so to speak. These things are way dumber than we give them credit for.

        • space_comrade [he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          7
          ·
          edit-2
          11 months ago

          Oh that’s definitely going to lead to some hilarious situations but I don’t think we’re gonna see a complete breakdown of the whole IT sector. There’s no way companies/institutions that do really mission critical work (kernels, firmware, automotive/aerospace software, certain kinds of banking/finance software etc.) will let AI write that code any time soon. The rest of the stuff isn’t really that important and isn’t that big of a deal it if breaks for a few hours/days because the AI spazzed out.

          • FunkyStuff [he/him]@hexbear.net
            link
            fedilink
            English
            arrow-up
            3
            ·
            11 months ago

            Agreed, don’t expect it to break absolutely everything but I expect that software development is going to get very hairy when you have to use whatever bloated mess AI is creating.

          • SmoothIsFast@citizensgaming.com
            link
            fedilink
            arrow-up
            1
            ·
            11 months ago

            If you have seen the crunch before demos for military projects you might start to think the other way. I doubt the bigger vendors will change much but you definetly could see contracts being won for shit that will just be ai generated because they got some base manager to eat up their proposal filled with buzz words. I’d be more worried about it, causing more contract bloat and wasted resources in critical systems going to these vapor ware solutions. Then you take general government contracts which go to the lowest bidder and you are gonna see a ton of AI bullshit start cropping up and bloating our systems because some high-school kid got chatgpt to make a basic website and no thinks he is the AI website God. Plus I work in the financial sector now and they have been eating up all the AI buzzwords like fucking hot cakes, the devs all know it will be a shit show but the ego from the executives thinking it’s a great idea won’t hear any of it, because think of the efficiency and bonuses they could get if they cut the implementation timeline down to a quarter. Not realizing the vulnerability, maintainence cost, and lack of understanding from the llm that will cause massive long-term issues regardless if they can get a buggy alpha created.

      • theluddite@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 months ago

        Yes I agree. I meant the fundamental problem with the idea of LLMs doing more and more of our code, even if they get quite good.

    • BloodyDeed@feddit.ch
      link
      fedilink
      arrow-up
      12
      ·
      edit-2
      11 months ago

      This is so true. I feel like my main job as a senior software engineer is to keep the bloat low and delete unused code. Its very easy to write code - maintaining it and focusing on the important bits is hard.

      This will be one of the biggest and most challenging problems Computer Science will have to solve in the coming years and decades.

      • floofloof@lemmy.ca
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        11 months ago

        It’s easy and fun to write new code, and it wins management’s respect. The harder work of maintaining and improving large code bases and data goes mostly unappreciated.

    • DefinitelyNotAPhone [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      9
      ·
      11 months ago

      There’s the other half of this problem, which is that the kind of code that LLMs are relatively good at pumping out with some degree of correctness are almost always the bits of code that aren’t difficult to begin with. A sorting algorithm on command is nice, but if you’re working on any kind of novel implementation then the hard bits are the business logic which in all likelihood has never been written before and is either sensitive information or just convoluted enough to make turning into a prompt difficult. You still have to have coders who understand architecture and converting requirements into raw logic to do that even with the LLMs.

    • AlexWIWA@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      ·
      11 months ago

      Makes the Adeptus Mechanicus look like a realistic future. Really advanced tech, but no one knows how it works

    • CriticalResist8 [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      6
      ·
      11 months ago

      I’ve had some success with it if I’m giving it small tasks and describe in as much detail as possible. By design (from what I gather) it can only work on stuff it was able to use in training, which means the language needs to be documented extensively for it to work.

      Stuff like Wordpress or MediaWiki code it does generally good at, actually helped me make the modules and templates I needed on mediawiki, but for both of those there’s like a decade of forum posts, documentation, papers and other material that it could train with. Fun fact: in one specific problem (using a mediawiki template to display a different message whether you are logged in or not), it systematically gives me the same answer no matter how I ask. It’s only after enough probing that GPT tells me because of cache issues, this is not possible lol. I figure someone must have asked about this same template somewhere and it’s the only thing it can work off of from its training set to answer that question.

      I also always double-check the code it gives me for any error or things that don’t exist.

  • SirGolan@lemmy.sdf.org
    link
    fedilink
    arrow-up
    20
    ·
    edit-2
    11 months ago

    Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).

          • SirGolan@lemmy.sdf.org
            link
            fedilink
            arrow-up
            2
            ·
            11 months ago

            Oh ok! Got it. I read it as you saying ChatGPT doesn’t use GPT 4. It’s still unclear what they used for part of it because of the bit before the part you quoted:

            For each of the 517 SO questions, the first two authors manually used the SO question’s title, body, and tags to form one question prompt3 and fed that to the Chat Interface [45] of ChatGPT.

            It doesn’t say if it’s 4 or 3.5, but I’m going to assume 3.5. Anyway, in the end they got the same result for GPT 3.5 that it gets on HumanEval, which isn’t anything interesting. Also, GPT 4 is much better, so I’m not really sure what the point is. Their stuff on the analysis of the language used in the questions was pretty interesting though.

            Also, thanks for finding their mention of 3.5. I missed that in my skim through obviously.

            • DPRK_Chopra [comrade/them]@hexbear.net
              link
              fedilink
              English
              arrow-up
              2
              ·
              11 months ago

              For sure, no worries. I had the same questions as you when reading it. Fwiw, the paper is really kind of sloppy. I think it’s maybe a case of poor students not wanting to pay for GPT-4? Maybe they’ll clean it up and respond to some of the criticisms when it comes out of draft, but it doesn’t seem like very rigorous scholarship to me.

              • SirGolan@lemmy.sdf.org
                link
                fedilink
                arrow-up
                2
                ·
                11 months ago

                Yeah I think you’re right on about the students not being able to afford GPT4 (I don’t blame them. The API version gets expensive quick). I agree though that it doesn’t seem super well put together.

    • floofloof@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      ·
      11 months ago

      Whatever GitHub Copilot uses (the version with the chat feature), I don’t find its code answers to be particularly accurate. Do we know which version that product uses?

      • SirGolan@lemmy.sdf.org
        link
        fedilink
        arrow-up
        4
        ·
        11 months ago

        If we are talking Copilot then that’s not ChatGPT. But I agree it’s ok. Like it can do simple things well but I go to GPT 4 for the hard stuff. (Or my own brain haha)

  • s20@lemmy.ml
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    11 months ago

    If I’m going to use AI for something, I want it to be right more often than I am, not just as often!

    • space_comrade [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      11 months ago

      It actually doesn’t have to be. For example the way I use Github Copilot is I give it a code snippet to generate and if it’s wrong I just write a bit more code and the it usually gets it right after 2-3 iterations and it still saves me time.

      The trick is you should be able to quickly determine if the code is what you want which means you need to have a bit of experience under your belt, so AI is pretty useless if not actively harmful for junior devs.

      Overall it’s a good tool if you can get your company to shell out $20 a month for it, not sure if I’d pay it out of my own pocket tho.

      • s20@lemmy.ml
        link
        fedilink
        arrow-up
        12
        arrow-down
        1
        ·
        11 months ago

        It… it was a joke. I was implying that 52% was better than me.

        • space_comrade [he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          9
          ·
          11 months ago

          Ah ok I guess I misread that. My point is that by itself it’s not gonna help you write either better or shittier code than you already do.

      • jvisick@programming.dev
        link
        fedilink
        arrow-up
        6
        ·
        11 months ago

        GitHub Copilot is just intellisense that can complete longer code blocks.

        I’ve found that it can somewhat regularly predict a couple lines of code that generally resemble what I was going to type, but it very rarely gives me correct completions. By a fairly wide margin, I end up needing to correct a piece or two. To your point, it can absolutely be detrimental to juniors or new learners by introducing bugs that are sometimes nastily subtle. I also find it getting in the way only a bit less frequently than it helps.

        I do recommend that experienced developers give it a shot because it has been a helpful tool. But to be clear - it’s really only a tool that helps me type faster. By no means does it help me produce better code, and I don’t ever see it full on replacing developers like the doomsayers like to preach. That being said, I think it’s $20 well spent for a company in that it easily saves more than $20 worth of time from my salary each month.

    • GBU_28@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      11 months ago

      The trick is you have to correct for the hallucinations, and teach it to revert back to a health path when going off course. This isn’t possible with current consumer tools.

  • r00ty@kbin.life
    link
    fedilink
    arrow-up
    13
    ·
    11 months ago

    I used ChatGPT once. It created non functional code. But, the general idea did help me get to where I wanted. Maybe it works better as a rubber duck substitute?

    • GBU_28@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      Use it as a boilerplate blaster, for shit you could write yourself

    • dom@lemmy.ca
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      11 months ago

      I did my first game jam with the help of chat gpt. It didn’t write any code in the game, but I was able to ask it how to accomplish certain things generally and it would give me ideas and it would be up to me to implement.

      There were other things I knew my engine could do but i couldn’t figure out using the documentation, ao I would ask chat gpt “how do you xyz in godot” and it would give me step by step. This was especially useful for the things that get done in the engine ui and not in code.

  • Fluffles@pawb.social
    link
    fedilink
    arrow-up
    13
    arrow-down
    1
    ·
    11 months ago

    I believe this phenomenon is called “artificial hallucination”. It’s when a language model exceeds its training and makes info out of thin air. All language models have this flaw. Not just ChatGPT.

    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      20
      arrow-down
      5
      ·
      11 months ago

      The fundamental problem is that at the end of the day it’s just a glorified Markov chain. LLM doesn’t have any actual understanding of what it produces in a human sense, it just knows that particular sets of tokens tend to go together in the data it’s been trained on. GPT mechanic could very well be a useful building block for making learning systems, but a lot more work will need to be done before they can actually be said to understand anything in a meaningful way.

      I suspect that to make a real AI we have to embody it in either a robot or a virtual avatar where it would learn to interact with its environment the way a child does. The AI has to build an internal representation of the physical world and its rules. Then we can teach it language using this common context where it would associate words with its understanding of the world. This kind of a shared context is essential for having AI understand things the way we do.

      • v_krishna@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        ·
        11 months ago

        A lot of semantic NLP tried this and it kind of worked but meanwhile statistical correlation won out. It turns out while humans consider semantic understanding to be really important it actually isn’t required for an overwhelming majority of industry use cases. As a Kantian at heart (and an ML engineer by trade) it sucks to recognize this, but it seems like semantic conceptualization as an epiphenomenon emerging from statistical concurrence really might be the way that (at least artificial) intelligence works

        • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
          link
          fedilink
          arrow-up
          7
          arrow-down
          4
          ·
          11 months ago

          I don’t see the approaches as mutually exclusive. Statistical correlation can get you pretty far, but we’re already seeing a lot of limitations with this approach when it comes to verifying correctness or having the algorithm explain how it came to a particular conclusion. In my view, this makes purely statistical approach inadequate for any situation where there is a specific result desired. For example, an autonomous vehicle has to drive on a road and correctly decide whether there are obstacles around it or not. Failing to do that correctly results in disastrous results and makes purely statistical approaches inherently unsafe.

          I think things like GPT could be building blocks for systems that are trained to have semantic understanding. I think what it comes down to is simply training a statistical model against a physical environment until it adjusts its internal topology to create an internal model of the environment through experience. I don’t expect that semantic conceptualization will simply appear out of feeding a bunch of random data into a GPT style system though.

          • v_krishna@lemmy.ml
            link
            fedilink
            English
            arrow-up
            2
            ·
            11 months ago

            I fully agree with this, would have written something similar but was eating lunch when I made my former comment. I also think there’s a big part of pragmatics that comes from embodiment that will become more and more important (and wish Merleau-Ponty was still around to hear what he thinks about this)

            • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
              link
              fedilink
              arrow-up
              2
              arrow-down
              5
              ·
              11 months ago

              Indeed, I definitely expect interesting things to start developing on that front, and we may see old ideas getting dusted off because now there’s enough computing power to put them to use. For example, I thought The Society of Mind from Minsky lays out a plausible architecture for a mind. Imagine each agent in that scenario being a GPT system, and the bigger mind being built out of a society of such agents each being concerned with a particular domain it learns about.

              • v_krishna@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                ·
                11 months ago

                Many (14?) years back I attended a conference (now I can’t remember what it was for, I think a complex systems department at some DC area university) and saw a lady give a talk about using agent based modeling to do computational sociology planning around federal (mostly navy/army) development in Hawaii. Essentially a sim city type of thing but purpose built to help aid in public planning decisions. Now imagine that but the agents aren’t just sets of weighted heuristics but instead weighted heuristic/prompt driven LLMs with higher level executive prompts to bring them together.

                • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
                  link
                  fedilink
                  arrow-up
                  3
                  arrow-down
                  5
                  ·
                  edit-2
                  11 months ago

                  I’m really excited to see this kind of stuff experimented with. I find it’s really useful of thinking of machine learning agent training in terms of creating a topology through balancing of the weights and connections that ends up being a model of a particular domain described by the data that it’s being fed. The agent learns patterns in the data it observes and creates an internal predictive model based on that. Currently, most machine learning systems seem to focus on either individual agents or small groups such as adding a supervisor. It would be interesting to see large graphs of such agents that interact in complex ways and where high level agents are only interacting with other agents and don’t even need to see any of the external inputs directly. One example would be to have a system trained on working with visual input and another with audio, and then have a high level system that’s responsible for integrating these inputs and doing the actual decision making.

                  and just ran across this https://arxiv.org/abs/2308.00352

      • FunkyStuff [he/him]@hexbear.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        11 months ago

        You have a pretty interesting idea that I hadn’t heard elsewhere. Do you know if there’s been any research to make an AI model learn that way?

        In my own time while I’ve messed around with some ML stuff, I’ve heard of approaches where you try to get the model to accomplish progressively more complex tasks but in the same domain. For example, if you wanted to train a model to control an agent in a physics simulation to walk like a humanoid you’d have it learn to crawl first, like a real human. I guess for an AGI it makes sense that you would have it try to learn a model of the world across different domains like vision, or sound. Heck, since you can plug any kind of input to it you could have it process radio, infrared, whatever else. That way it could have a very complete model of the world.

  • Afghaniscran@feddit.uk
    link
    fedilink
    arrow-up
    12
    ·
    11 months ago

    I used it to code small things and it worked eventually whereas if I just decided to learn coding I’d be stuck cos I don’t do computers, I do hvac.

  • Pleonasm@programming.dev
    link
    fedilink
    arrow-up
    8
    ·
    11 months ago

    I was pretty impressed with it the other day, it converted ~150 lines of Python to C pretty flawlessly. I then asked it to extend the program by adding a progress bar to the program and that segfaulted, but it was immediately able to discover the segfault and fix it when I mentioned. Probably would have taken me an hour or two to write myself and ChatGPT did it in 5 minutes.