- cross-posted to:
- technology
- opensource@lemmy.ml
- cross-posted to:
- technology
- opensource@lemmy.ml
Instead of just generating the next response, it simulates entire conversation trees to find paths that achieve long-term goals.
How it works:
- Generates multiple response candidates at each conversation state
- Simulates how conversations might unfold down each branch (using the LLM to predict user responses)
- Scores each trajectory on metrics like empathy, goal achievement, coherence
- Uses MCTS with UCB1 to efficiently explore the most promising paths
- Selects the response that leads to the best expected outcome
Limitations:
- Scoring is done by the same LLM that generates responses
- Branch pruning is naive - just threshold-based instead of something smarter like progressive widening
- Memory usage grows with tree size, there currently no node recycling



I think that’s an interesting approach as well. There are a bunch of research papers on using MCTS with LLMs, a few examples here:
https://arxiv.org/abs/2503.19309
https://arxiv.org/abs/2505.23229
https://arxiv.org/abs/2504.02426
https://arxiv.org/abs/2504.11009
https://arxiv.org/abs/2502.13428