Mastra Bug: Missing Tool Calls In Message Storage

by Lucia Rojas 50 views

Hey everyone,

We've got a potential bug on our hands, and I wanted to walk you through the details so we can squash it together. It seems like when we're pulling messages from storage, specifically in the later parts of our assistant conversations, the tool calls from earlier exchanges are going missing. Let's dive into what's happening and how we can fix it.

The Bug: Missing Tool Calls

So, the main issue here is that in Langfuse, the traces for messages that come later in our assistant conversations aren't including any of the tool calls from the earlier exchanges. It's like the system is forgetting what tools were called previously. After a bit of debugging, it looks like Mastra isn't including these tool invocations or results from previous generations in the messages passed to streamText or generateText. This can lead to some pretty confusing behavior, especially when our agents rely on the context of previous tool calls to make informed decisions.

Digging Deeper

Stepping through the code with the debugger, I noticed that the tool information is retrieved from the database during the setup phase. You can see this happening here. The tool information is also present in the messageList here. However, the tool calls seem to vanish during the creation of processedMemoryMessages in the agent's memory setup code, which you can find here. This is where things get a bit tricky, so let's break it down further.

The Root Cause

It appears the issue stems from a few key steps in how we process messages. Let's walk through the process:

  1. processedMemoryMessages Creation: The processedMemoryMessages are created using messages: messageList.get.remembered.v1(). This is the first step where the tool information starts to go missing. The method used to create processedMemoryMessages plays a crucial role in how messages are handled, setting the stage for the subsequent issues.
  2. remembered.v1 Conversion: Next, remembered.v1 calls convertToV1Messages(this.remembered.v2()). This conversion process is where the structure of the messages begins to change, leading to the eventual loss of tool call information.
  3. convertToV1Messages Splitting: Inside convertToV1Messages, the single tool invocation with text parts gets split into two separate messages: a tool-call message and a text message. Both of these messages end up with the same ID. This split is a critical point because it sets up the scenario where one message can overwrite another, causing data loss. The duplication of IDs is a key factor in the bug's manifestation.
  4. Adding Messages to MessageList: The messages are then added to the MessageList using .add(processedMemoryMessages, 'memory'). This is where the actual merging and potential overwriting of messages occur. The way these messages are added to the list directly impacts whether the tool information is preserved or lost.
  5. MessageList.addOne Replacement: When MessageList.addOne is called for each message, it will replace the separate tool invocation part message with the text message because they both share the same ID. This is the crux of the problem. The system incorrectly overwrites the tool call message with the text message, effectively deleting the tool information from the message history.
  6. Tool Information Loss: From this point on, the tool information is missing. Once the tool invocation part of the message is replaced, that data is no longer available for subsequent steps in the conversation. This loss of information can significantly impact the agent's ability to function correctly.

So, to summarize, the issue arises because the convertToV1Messages function splits tool invocations into separate messages with the same ID, leading to one message overwriting the other in the MessageList. It's a bit of a chain reaction, but hopefully, this breakdown makes it clearer.

Steps to Reproduce

This bug is reliably reproducible in our setup using LibSQLStore for storage and a simple Agent running either .generate or .stream with maxSteps: 5 and a couple of tools. It seems to happen for all instances of the agent that call tools in our setup. If you want to try it out yourself, here’s a quick rundown:

  1. Set Up the Environment: Make sure you have LibSQLStore configured for storage. This is crucial because the bug manifests specifically when using this storage solution. The interaction between the storage and the message processing seems to be a key factor.
  2. Create a Simple Agent: Use a basic Agent setup with a few tools. The agent doesn't need to be overly complex; the issue arises with standard tool usage. A simple agent helps isolate the problem and makes it easier to reproduce.
  3. Run .generate or .stream: Run either the .generate or .stream method with maxSteps: 5. This ensures that the agent interacts multiple times, which is necessary for the bug to become apparent. Multiple interactions allow the message history to build up, triggering the issue.
  4. Call Tools: Ensure the agent calls at least a couple of tools during the conversation. The bug specifically affects tool invocations, so this is a key step. Without tool calls, the issue won't be triggered.
  5. Observe Missing Tool Calls: Check the traces in Langfuse to see if tool calls from earlier exchanges are missing in later messages. This is the ultimate confirmation that the bug is present. If tool calls are absent, the bug is successfully reproduced.

I can provide more detailed reproduction info if needed, but I thought it best to confirm my understanding before spending more time on that. Let me know if you'd like a more step-by-step guide, and I’ll put something together.

Expected Behavior

The expected behavior is that tool invocations and results from history should be sent in the messages to streamText/generateText unless they are explicitly filtered out. In other words, the agent should have access to the complete history of tool calls and their results, ensuring it can make informed decisions based on the entire conversation context. This consistency is vital for maintaining coherent and context-aware interactions.

Why This Matters

When tool invocations and results are missing, the agent loses critical context. It might try to call the same tool multiple times, fail to use the results of previous calls, or generally act in a way that doesn’t make sense given the conversation’s history. This can lead to a frustrating user experience and undermine the agent’s overall effectiveness. Ensuring that all relevant information is available is crucial for the agent to function as intended.

Environment Information

Here’s a snapshot of the environment I’m working in:

  • @mastra/core 0.11.1
  • @mastra/libsql 0.11.2
  • @mastra/loggers 0.10.4
  • @mastra/memory 0.11.5
  • @mastra/pg 0.12.5
  • @mastra/upstash 0.12.3
  • mastra 0.10.15

My versions are a bit out of date, but after reviewing the code in the main branch, these code paths appear to be unchanged. So, I believe the issue is still present in the latest versions. If anyone has tested this on a more recent version and can confirm, that would be super helpful!

Verification

Before posting, I made sure to do a couple of checks:

  • [x] I have searched the existing issues to make sure this is not a duplicate.
  • [x] I have included sufficient information for the team to reproduce and understand the issue.

I want to make sure we’re not duplicating effort and that everyone has enough information to get started on a fix. If there’s anything else I can provide, just let me know!

Next Steps

So, what’s the plan from here, guys? I think the next step is to confirm that this is indeed the root cause and then start thinking about how to fix it. One potential solution might be to ensure that tool call and text messages are not assigned the same IDs, or to adjust how messages are added to the MessageList to prevent overwriting. I’m open to any ideas, though!

Let’s discuss the best approach to tackle this bug and get it resolved. Any thoughts or insights would be greatly appreciated!