案例拆解 1：用 LangChain/LangGraph 构建 "Bash 终端 Agent" - 基于 NemoTron 的实现解析

发布于 2025年11月2日

更新于 5 个月前

关于本文的代码实现

案例来源：本文的案例源于 LangChain 官方在 X 平台分享的这个案例，该 tweet 介绍了 NVIDIA 的 Bash Agent 实现。

重要说明：本文基于 NVIDIA 的原始案例，但做了以下修改和扩展：

版本升级：原文使用的是 LangGraph < 1.0 版本的 create_react_agent API。本文的 main_langgraph.py 已升级为 LangGraph 1.0 版本的 create_agent API，以适配最新的框架变化。
新增实现方式：为了更全面地展示 Agent 的实现演进，我自己实现了一个基于 LangGraph StateGraph 的版本（main_from_langgraph.py），展示了如何手写状态机来构建 Agent。

因此，本文提供了三种不同抽象层次的实现方式：

main_from_scratch.py - 纯 OpenAI API 实现（源自原文）
main_from_langgraph.py - 手写 StateGraph 实现（本文新增）
main_langgraph.py - create_agent 高级封装（已升级到 LangGraph 1.0）

这种对比展示能帮助你更深入地理解 Agent 的工作原理，并在实际项目中选择最适合的抽象层次。

1. 问题／场景简介 (The WHY)

1.1 用户想做什么？

用户希望通过自然语言与终端（命令行界面）进行交互，而不是手动输入精确的 Bash 命令。例如，用户可以说“列出当前目录下所有的 Python 文件”，而不是输入 ls *.py。这极大地降低了终端的使用门槛，让不熟悉命令行的用户也能完成复杂操作。

1.2 为什么不是简单用脚本？

传统的脚本（如 Bash Script 或 Python Script）是“一次性”和“确定性”的。它们为特定任务编写，缺乏灵活性和泛化能力。如果用户的意图稍有变化（例如从“列出文件”变为“统计文件数量并排序”），脚本就需要重写。而基于 LLM 的 Agent 能够理解多样的、模糊的自然语言指令，动态生成并执行相应的命令，适应性远超传统脚本。

1.3 难点在哪里？

意图理解的精确性：如何确保 LLM 能准确地将用户的模糊描述（如“帮我看看这个文件夹里有啥”）转换成精确无误的 Bash 命令（如 ls -l）。
安全性与可控性：直接执行 LLM 生成的命令风险极高，可能会产生破坏性操作（如 rm -rf /）。必须建立一套机制，让用户在命令执行前进行审核与确认。
状态管理与上下文记忆：终端操作通常是连续的。Agent 需要记住当前的工作目录（cwd）、之前的命令历史以及输出结果，以便在后续的交互中正确理解上下文（例如，用户先 cd my_folder，然后说“在这里创建一个文件”）。
工具的封装与调用：如何将 Bash 命令这个“外部工具”无缝地接入 LangChain/LangGraph 的框架中，让 LLM 能够知道何时以及如何调用它。

2. 架构图与流程说明 (The Big Picture)

2.1 架构图

              +-------------------+      +----------------------+      +--------------------+
              |   User Input      |----->|   LLM (NemoTron)     |----->|  Tool (Bash Shell) |
              | (e.g., "list     |      |   w/ System Prompt   |      |   (Executor)       |
              | python files")    |      |   & Tool Definition  |      |                    |
              +-------------------+      +----------+-----------+      +----------+---------+
                     ^                              |                         |
                     |                              | (Generated Command)     | (Execution Result)
                     |                              |                         |
              (Final Answer)                 +----------------------+      +--------------------+
                     |                         |  User Confirmation   |----->|   State Management |
                     +---------------------- |   (Allow/Deny)       |      |   (cwd, history)   |
                                             +----------------------+      +--------------------+

2.2 文字说明流程每一环节

输入 (Input)：用户以自然语言形式提出请求，例如“What is my current directory?”。
模型 (Model)：请求被发送给配置了特定 system_prompt 和工具（Bash Command Executor）的 NemoTron 模型。
工具 (Tool)：模型分析输入后，判断需要调用该工具，并生成相应的命令，如 pwd。
人工确认 (Human-in-the-loop)：Agent 将待执行的命令呈现给用户，并请求授权。这是至关重要的安全环节。
执行与状态更新 (Execution & State)：用户确认后，Bash 工具执行该命令。执行结果和当前工作目录（cwd）等状态信息被捕获并更新。
输出 (Output)：执行结果被返回给模型，模型生成最终的自然语言回答，流程结束或等待下一次输入。

3. 核心组件分析 (The WHAT)

3.1 模型选择与配置

本案例使用 LangChain 的 ChatOpenAI 客户端，可以连接任何兼容 OpenAI API 的模型（包括 NVIDIA 的 NemoTron 系列）。这种设计的优势在于：

API 兼容性：通过配置 openai_api_base，可以灵活切换不同的模型提供商。
函数调用能力：现代 LLM（如 NemoTron-8B-Function-Calling）经过专门微调，能够很好地理解工具定义，并可靠地将自然语言指令转换为结构化的工具调用。
可配置性：通过 Config 对象统一管理模型参数（temperature, top_p 等），便于实验和调优。

3.2 工具 (Bash 命令执行器) 如何封装？

本案例将 Bash 执行器封装为一个独立的 Bash 类，具有以下特点：

状态管理：类内部维护 cwd（当前工作目录）状态，确保连续命令的上下文正确性。
安全机制：内置命令白名单检查和注入攻击防护，在执行前进行多重验证。
智能包装：通过在命令后追加 pwd 并用标记分隔，巧妙地在每次执行后自动更新工作目录。
JSON Schema 导出：提供 to_json_schema() 方法，让 LLM 能够理解工具的接口定义。

这种封装将危险的 subprocess 调用隔离在受控接口内，是生产级工具设计的典范。

3.3 状态设计 (cwd、历史记录) 怎样体现？

状态是 Agent 实现连续、有上下文对话的关键。本案例采用了双层状态管理：

工具层状态 - 当前工作目录 (cwd)：由 Bash 类维护。每次执行命令后，通过解析 pwd 输出自动更新 cwd，确保后续命令在正确的目录下执行。这个状态在整个会话期间持久存在。
对话层状态 - 历史记录 (History)：由 LangChain 的 InMemorySaver (checkpointer) 管理。它将每一轮的交互（用户消息、模型响应、工具调用结果）作为完整的消息链记录下来，作为下一次模型调用的上下文。通过 thread_id，可以为每个用户会话维护独立的对话历史。

这种分层设计使得工具状态和对话状态各司其职，既保证了命令执行的正确性，又实现了多轮对话的连贯性。

4. 关键代码片段讲解 (The HOW)

4.1 工具封装代码

    def _run_bash_command(self, cmd: str) -> Dict[str, str]:
        """
        Runs the bash command and catches exceptions (if any).
        """
        stdout = ""
        stderr = ""
        new_cwd = self.cwd

        try:
            # Wrap the command so we can keep track of the working directory.
            wrapped = f"{cmd};echo __END__;pwd"
            result = subprocess.run(
                wrapped,
                shell=True,
                cwd=self.cwd,
                capture_output=True,
                text=True,
                executable="/bin/bash"
            )
            stderr = result.stderr
            # Find the separator marker
            split = result.stdout.split("__END__")
            stdout = split[0].strip()

            # If no output/error at all, inform that the call was successful.
            if not stdout and not stderr:
                stdout = "Command executed successfully, without any output."

            # Get the new working directory, and change it
            new_cwd = split[-1].strip()
            self.cwd = new_cwd
        except Exception as e:
            stdout = ""
            stderr = str(e)

        return {
            "stdout": stdout,
            "stderr": stderr,
            "cwd": new_cwd,
        }

作用：Bash 工具的核心执行方法，是 Agent 与操作系统交互的桥梁。通过包装命令（添加 pwd），它能够追踪并更新工作目录。
输入/输出：输入字符串命令，输出包含执行结果（stdout/stderr）和当前工作目录 cwd 的字典。
联系：返回 cwd 是为了确保状态的正确传递和管理。通过 __END__ 标记，巧妙地分离了命令输出和工作目录信息。

4.2 安全机制 (1) - 命令白名单检查

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        """
        Execute the bash command after checking the allowlist.
        """
        if cmd:
            # Prevent command injection via backticks or $. This blocks variables too.
            if re.search(r"[`$]", cmd):
                return {"error": "Command injection patterns are not allowed."}

            # Check the allowlist
            for cmd_part in self._split_commands(cmd):
                if cmd_part not in self.config.allowed_commands:
                    return {"error": "Parts of this command were not in the allowlist."}

            return self._run_bash_command(cmd)

        return {"error": "No command was provided"}

作用：在执行任何命令前，先进行双重安全检查：防止命令注入攻击（检查反引号和 $ 符号），并确保命令在白名单内。
输入/输出：输入是待执行的命令字符串，输出是执行结果或错误信息。
联系：这是多层防御策略的第一层。即使 LLM 生成了危险命令，这个检查也会将其拦截。

4.3 安全机制 (2) - 人工确认

class ExecOnConfirm:
    """
    A wrapper around the Bash tool that asks for user confirmation before executing any command.
    """

    def __init__(self, bash: Bash):
        self.bash = bash

    def _confirm_execution(self, cmd: str) -> bool:
        """Ask the user whether the suggested command should be executed."""
        return input(f"    ▶️   Execute '{cmd}'? [y/N]: ").strip().lower() == "y"

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        """Execute a bash command after confirming with the user."""
        if self._confirm_execution(cmd):
            return self.bash.exec_bash_command(cmd)
        return {"error": "The user declined the execution of this command."}

作用：多层防御的第二层 - 在工具执行前增加人工确认环节，这是生产级 Agent 的关键安全机制。
输入/输出：输入是待执行的命令字符串，输出是执行结果或拒绝消息。
联系：这个包装器模式可以应用到任何需要人工审核的工具上，是"Human-in-the-loop"设计的典型实现。配合白名单检查，形成了完整的安全防护链。

4.4 Agent 创建与主循环

def main(config: Config):
    # Create the client
    llm = ChatOpenAI(
        model=config.llm_model_name,
        openai_api_base=config.llm_base_url,
        openai_api_key=config.llm_api_key,
        temperature=config.llm_temperature,
        top_p=config.llm_top_p,
    )
    # Create the tool
    bash = Bash(config)
    # Create the agent
    agent = create_agent(
        model=llm,
        tools=[ExecOnConfirm(bash).exec_bash_command],
        system_prompt=config.system_prompt,
        checkpointer=InMemorySaver(),
    )

作用：使用 LangChain 的 create_agent 函数创建一个完整的 Agent，并将带有人工确认机制的 Bash 工具绑定到模型上。
输入/输出：输入是配置对象，输出是一个可执行的 Agent 实例。
联系：checkpointer=InMemorySaver() 使 Agent 具备了会话记忆能力，能够在多轮对话中保持上下文。工具通过 ExecOnConfirm 包装，确保每次执行前都会请求用户确认。

4.5 工具的 JSON Schema 定义

    def to_json_schema(self) -> Dict[str, Any]:
        """
        Convert the function signature to a JSON schema for LLM tool calling.
        """
        return {
            "type": "function",
            "function": {
                "name": "exec_bash_command",
                "description": "Execute a bash command and return stdout/stderr and the working directory",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "cmd": {
                            "type": "string",
                            "description": "The bash command to execute"
                        }
                    },
                    "required": ["cmd"],
                },
            },
        }

作用：将工具接口转换为 LLM 可理解的 JSON Schema 格式，这是函数调用（Function Calling）的基础。
输入/输出：无输入参数，输出符合 OpenAI Function Calling 规范的 JSON Schema。
联系：这个 Schema 会被发送给 LLM，告诉它该工具的名称、功能描述和参数格式。LLM 根据这些信息决定何时以及如何调用该工具。清晰的描述（description）对于 LLM 正确理解工具至关重要。

5. 三种实现方式对比：从底层到高层的演进 (The Evolution)

本案例提供了三种不同的实现方式，展示了从手动管理到框架封装的演进过程。这对于理解 Agent 的工作原理以及选择合适的抽象层次非常有帮助。

5.1 实现方式一：纯 OpenAI API（`main_from_scratch.py`）

核心特点：完全手动控制，不使用任何高级框架。

        while True:
            print("\n[🤖] Thinking...")
            response, tool_calls = llm.query(messages, [bash.to_json_schema()])

            if response:
                response = response.strip()
                # Do not store the thinking part to save context space
                # if "</think>" in response:
                #     response = response.split("</think>")[-1].strip()

                # Add the (non-empty) response to the context
                if response:
                    messages.add_assistant_message(response)

            if tool_calls:
                for tc in tool_calls:
                    function_name = tc.function.name
                    function_args = json.loads(tc.function.arguments)

                    # Ensure it's calling the right tool
                    if function_name != "exec_bash_command" or "cmd" not in function_args:
                        tool_call_result = json.dumps({"error": "Incorrect tool or function argument"})
                    else:
                        command = function_args["cmd"]
                        # Confirm execution with the user
                        if confirm_execution(command):
                            tool_call_result = bash.exec_bash_command(command)
                        else:
                            tool_call_result = {"error": "The user declined the execution of this command."}

                    messages.add_tool_message(tool_call_result, tc.id)
            else:
                # Display the assistant's message to the user.
                if response:
                    print(response)
                    print("-" * 80 + "\n")
                break

优点：

完全透明：每一步都清晰可见，便于学习和调试
灵活控制：可以精确控制消息流转、循环逻辑和错误处理
依赖最少：只需要 OpenAI SDK，不依赖 LangChain/LangGraph

缺点：

代码冗长：需要手动处理消息管理、工具调用解析、循环控制
容易出错：状态管理、错误边界需要自己处理
可扩展性差：添加新功能需要修改核心循环逻辑

适用场景：学习 Agent 工作原理、需要极致性能、或不想引入框架依赖的场景。

5.2 实现方式二：手写 StateGraph（`main_from_langgraph.py`）

核心特点：使用 LangGraph 的状态机抽象，但自己定义所有节点和边的逻辑。

# Create the graph
workflow = StateGraph(MessagesState)

# Add nodes
workflow.add_node("agent", call_model)
workflow.add_node("tools", call_tools)

# Set entry point
workflow.set_entry_point("agent")

# Add conditional edges
workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "tools": "tools",
        "end": END,
    }
)

# After tools, always go back to agent
workflow.add_edge("tools", "agent")

# Compile the graph
graph = workflow.compile(checkpointer=InMemorySaver())

关键节点实现：

def call_model(state: MessagesState):
    """Call the LLM with the current messages."""
    messages = state["messages"]

    # Add system message if it's not already present
    has_system = any(isinstance(msg, SystemMessage) for msg in messages)
    if not has_system:
        system_message = SystemMessage(content=config.system_prompt)
        messages = [system_message] + messages

    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

def call_tools(state: MessagesState):
    """Execute tools and return results."""
    messages = state["messages"]
    last_message = messages[-1]

    tool_results = []

    # Process each tool call
    if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
        for tool_call in last_message.tool_calls:
            # Handle both dict and object formats
            if isinstance(tool_call, dict):
                tool_name = tool_call.get("name", "")
                tool_args = tool_call.get("args", {})
                tool_id = tool_call.get("id", "")
            else:
                # ToolCall object
                tool_name = getattr(tool_call, "name", "")
                tool_args = getattr(tool_call, "args", {})
                tool_id = getattr(tool_call, "id", "")

            if tool_name == "exec_bash_command":
                cmd = tool_args.get("cmd", "") if isinstance(
                    tool_args, dict) else getattr(tool_args, "cmd", "")
                result = bash_tool.exec_bash_command(cmd)
                # Convert result to string format similar to main_from_scratch.py
                result_str = json.dumps(result)
                tool_results.append(
                    ToolMessage(content=result_str, tool_call_id=tool_id)
                )
            else:
                tool_results.append(
                    ToolMessage(
                        content=json.dumps(
                            {"error": "Incorrect tool or function argument"}),
                        tool_call_id=tool_id
                    )
                )

    return {"messages": tool_results}

优点：

清晰的架构：通过状态机图清晰地表达了 Agent 的工作流程
模块化设计：每个节点是独立的函数，易于测试和修改
内置状态管理：LangGraph 自动管理状态的传递和更新
可视化友好：可以用 Mermaid 图等方式可视化整个流程

缺点：

需要理解图抽象：需要掌握节点、边、条件路由等概念
样板代码仍存在：节点函数需要自己实现状态处理逻辑
调试稍复杂：错误可能发生在节点内部或边的路由逻辑中

适用场景：需要复杂控制流（多步推理、分支、循环）、需要可视化流程、或希望在享受框架便利的同时保持细粒度控制。

5.3 实现方式三：LangGraph 高级封装（`main_langgraph.py`）

核心特点：使用 LangGraph 1.0 的 create_agent 高级 API，框架自动处理所有底层逻辑。

def main(config: Config):
    # Create the client
    llm = ChatOpenAI(
        model=config.llm_model_name,
        openai_api_base=config.llm_base_url,
        openai_api_key=config.llm_api_key,
        temperature=config.llm_temperature,
        top_p=config.llm_top_p,
    )
    # Create the tool
    bash = Bash(config)
    # Create the agent
    agent = create_agent(
        model=llm,
        tools=[ExecOnConfirm(bash).exec_bash_command],
        system_prompt=config.system_prompt,
        checkpointer=InMemorySaver(),
    )

使用方式：

        # Run the agent's logic and get the response.
        result = agent.invoke(
            {"messages": [{"role": "user", "content": user}]},
            # one ongoing conversation
            config={"configurable": {"thread_id": "cli"}},
        )
        # Show the response (without the thinking part, if any)
        response = result["messages"][-1].content.strip()

        if "</think>" in response:
            response = response.split("</think>")[-1].strip()

        if response:
            print(response)
            print("-" * 80 + "\n")

优点：

极简代码：几行代码即可创建完整的 Agent
开箱即用：内置了最佳实践（错误处理、重试、流式输出等）
快速迭代：专注于业务逻辑（工具定义、提示词），而非框架细节
生产就绪：经过充分测试，适合快速部署

缺点：

黑盒化：内部逻辑被封装，不便于深度定制
灵活性受限：无法轻易修改核心流程（如自定义条件分支）
框架绑定：强依赖 LangGraph 生态

适用场景：快速原型开发、标准 Agent 应用、不需要复杂控制流的场景。

5.4 三种方式对比总结

维度	纯 OpenAI API	手写 StateGraph	create_agent 封装
代码量	最多（~70 行核心逻辑）	中等（~50 行核心逻辑）	最少（~20 行核心逻辑）
学习曲线	低（只需理解 API）	中（需理解图抽象）	低（会用 API 即可）
控制粒度	最细（每一步可控）	细（节点级可控）	粗（只能配置参数）
可维护性	差（逻辑耦合）	好（模块化清晰）	最好（框架维护）
扩展性	差（需改核心循环）	好（添加节点/边）	中（受限于框架）
适用场景	学习、极简依赖	复杂流程、自定义需求	快速开发、标准应用

5.5 选择建议

你正在学习 Agent 原理 → 从方式一开始，理解每一个环节
你需要构建复杂的多步推理流程 → 使用方式二，手写 StateGraph
你要快速上线一个标准 Agent → 使用方式三，create_agent
你的项目在演进中 → 从方式三开始，遇到瓶颈时重构为方式二

推荐实践：先用方式三快速验证想法，如果需要更多控制，再逐步"降级"到方式二甚至方式一。这种"先抽象后具体"的策略能平衡开发效率和系统灵活性。

6. 可复用模式提炼 (The SO WHAT)

6.1 核心设计模式

自然语言 → 工具调用模式：所有 Agent 的基础。通过向 LLM 提供清晰的 JSON Schema 工具定义，让其充当智能"路由器"，自主决定何时以及如何调用工具。
多层安全防护模式：生产环境中必不可少的安全设计，本案例实现了三层防护：
- 第一层：命令白名单 + 注入攻击检测（工具层）
- 第二层：人工确认（Human-in-the-loop）
- 第三层：Subprocess 隔离执行
双层状态管理模式：分离工具状态（cwd）和对话状态（history），各司其职又协同工作，使 Agent 能够处理有上下文的多步骤复杂任务。
包装器（Wrapper）模式：通过 ExecOnConfirm 类包装原始工具，在不修改工具核心逻辑的情况下增强功能。这是一个可扩展性极强的设计，可应用于日志记录、性能监控等场景。
配置驱动模式：通过 Config 对象统一管理模型参数、安全策略（白名单）、系统提示词等，使得系统的调整和部署更加灵活。

6.2 从代码到产品的思考

这个案例虽然简洁，但展示了构建生产级 Agent 的完整思路：不仅要让 Agent "能用"（能调用工具），还要让它 "好用"（有上下文记忆）和 "安全"（多层防护）。

7. 工程改进建议 (The WHAT'S NEXT)

7.1 安全机制加强？

严格的 Allowlist：在工具内部增加命令黑名单，禁止 rm, sudo 等高危命令。
沙箱环境 (Sandboxing)：在隔离的 Docker 容器中执行命令，限制其文件系统和网络权限。

8. 总结 + 操作链接／附录

8.1 总结核心收获

这个案例的核心思想是：通过赋予 LLM 有限且安全的工具，并围绕其构建一个管理状态和控制流程的外部框架（如 LangGraph），我们可以将 LLM 的推理能力安全、可靠地应用到解决实际问题中。

更重要的是，通过三种不同的实现方式（纯 API → 手写 StateGraph → 高级封装），我们看到了同一个问题的不同解决层次，这帮助我们在实际项目中根据复杂度和需求选择最合适的抽象层次。

8.2 参考链接

LangChain 官方 Tweet: LangChainAI on X - Bash Agent 案例分享
NVIDIA 原文: Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour
LangChain 文档: LangChain Documentation
LangGraph 文档: LangGraph Documentation
本文源码: github codebase