I’ve got a bot running/in development to detect and flag toxic content on Lemmy but I’d like to improve on it as I’m getting quite a few false positives. I think that part of the reason is that what constitutes toxic content often depends on the parent comment or post.

During a recent postgrad assignment I was taught (and saw for myself) that a bag of words model usually outperforms LSTM or transformer models for toxic text classification, so I’ve run with that, but I’m wondering if it was the right choice.

Does anyone have any ideas on what kind of model would be most suitable to include a parent as context, but to not explicitly consider whether the parent is toxic? I’m guessing some sort of transformer model, but I’m not quite sure how it might look/work.

  • vluz@kbin.social
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    While designing a similar classifier, I’ve considered the idea of giving it the whole thread as “context” of sorts.
    Not just the parent comment, the whole thread up to original post.

    I’ve abandoned the idea.
    A comment must stand on it’s own, and it would put limits on results, the way I was planning to do it.
    I might be very wrong, your insight into this would be very helpful.

    My original idea was to go recursively trough the thread and test each comment individually.
    Then I would influence the actual comment results with the combined results of it’s parents.
    No context during inference, just one comment at a time.

    For example consider thread OP->C1->C2->C3.
    My current model takes milliseconds per test with little resources used.
    It would be ok up to very large threads but would contain a limit to save on answer time.
    I want to determine if Comment 3 is toxic in the context of C2, C1, and OP.
    Test C3, test C2, test C1, test OP. Save results.
    My current model gives answer in several fields (“toxic”, “severe toxic”, “obscene”, “threat”, “insult”, and “identity hate”)
    The idea was to then combine the results of each into a final result for C3.

    How to combine? Haven’t figure it out but it would be results manipulation instead of inference/context, etc.

    Edit: Is there any way you can point me at examples difficult to classify? It would be a nice real world test to my stuff.
    Current iteration of model is very new and has not been tested in the wild.

    • Bluetreefrog@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      (“toxic”, “severe toxic”, “obscene”, “threat”, “insult”, and “identity hate”)

      You aren’t the author of Detoxify are you by any chance? It uses the same classifications. I was originally using it but switched to my own model as I really only needed binary classification and felt a new dataset that better suited Lemmy was needed anyway. I have 2 outputs (toxic and not-toxic).

      I’ve been building my own dataset as the existing ones on Huggingface seemed to contain a lot of content you might see on Twitter, and were a poor match for Lemmy. Having said that, I’ve generally avoided putting that sort of content into the dataset as I figured if I can’t easily decide if it’s toxic, then how could a model.

      Is there any way you can point me at examples difficult to classify? It would be a nice real world test to my stuff. Current iteration of model is very new and has not been tested in the wild.

      Here’s a few where I’ve had to go back to the parent comment or post to try and work out if it was toxic or not:

      • Do your research on the case and the answers will be very obvious. (What comment prompted this? Is it a trolling comment or a reasonable response to a trolling comment)
      • Because you’re a fascist. The fact that they disagree with you is secondary (Is the commenter calling another commenter a fascist, or continuing a discussion?)
      • Me tard, you tard, removed nation! (Is this a quote from a movie or TV show or an insult to another commenter? Not sure.)
      • Fuck you shoresy! (pretty sure this is a quote from a tv show)

      A comment must stand on it’s own, and it would put limits on results, the way I was planning to do it. I might be very wrong, your insight into this would be very helpful.

      I originally thought that, and I’m actively tuning my model to try and get the best results on the comment alone, but I don’t think I’ll ever get better than about 80% accuracy. I’ve come to the conclusion that those cases in the grey zone where toxic ~= not-toxic can only be resolved by looking upstream.