从 0 构建一个 Gemini Cli - Step 1

Rosin

2025-09-26

ai, 技术

0x00 背景

最近工作中的项目需要对接 Kubernetes，相关知识看起来太枯燥，正好之前 New API 相关的活搞完了，想摸鱼写一个类似 Gemini Cli 的终端交互客户端学习体验下。

由于这个月的 Cursor 额度已经告急，在 Codex 之前想先白嫖 Gemini Cli 尝试构建这个项目。全程 Vibe Coding 的话可能对自己提升不大，遂 PUA Gemini 又当程序员当老师来一起完成这个项目。初始的提示词如下：

你是一个熟悉人工智能、智能体开发的专家，精通 Golang 等开发语言，遵循 Rob Pike 的设计哲学，优先使用简单易读的代码。我是一个有一点 Golang 基础，LLM 相关知识的学生。你同时需要担任一个老师，帮助我理解接下来这个项目开发中需要了解的必要知识。

我想实现一个基于 Golang 开发的 LLM API 客户端，优先需要可以配置支持 openai 标准的 llm api 渠道，支持通过 TUI 交互，最终期望实现一个类似 Gemini Cli、Codex、Claude Code 的终端中的 LLM 客户端。

目前计划可能用到的技术栈：
- **CLI 框架**: spf13/cobra
- **配置管理**: spf13/viper
- **TUI 框架**: charmbracelet/bubbletea
- **Markdown 渲染**: charmbracelet/glamour

最近正好在补 SAC tv 版第二季，Ghost in the shell 这个设定又很适合这个项目的背景，所以新建的仓库就叫它 GhOst 了。

0x01 实现 LLM API 命令调用

好的，我们首先初始化项目的结构，第一阶段的目标是实现命令行和 llm api 交互，计划使用的交互命令是 ghost，如 ghost -p "hello"，使用 viper 管理 api 和密钥配置，测试用的 api 是http://localhost:3000/v1，密钥是 sk-Ak627xIaxSfgaNNA96aeRdLijcNQyK7HEaafC12HDQRCgNaN

Gemini 很快生成了基本的 Cobra 项目结构，并且在 internal/llm/client.go 中生成了简单的 openai 风格接口调用和 json 解析。首次调用发现有个模型参数的缺失在之后加上了。

// Completion sends a prompt to the LLM and returns the response.
func (c *Client) Completion(prompt string, model string) (string, error) {
	reqBody := CompletionRequest{
		Model: model,
		Messages: []Message{
			{
				Role:    "user",
				Content: prompt,
			},
		},
	}
}

启动后可以完成单次的模型 API 交互：

D:\Projects\my-project\GhOst git:[main]
go run main.go "你好"
You: 你好
GhOst: ...
GhOst: 你好！有什么我可以帮助你的吗？

这里发现 Gemini Cli 有个问题，就是之前返回的信息有 typo：vipet.SetDefault("model", "gpt-3.5-turbo")，这里应该是 viper，但是理论上 viper 应该是单个 token，很奇怪为啥会返回错误的拼写，并且后续由于缓存记忆的原因会不停给我改成错的。。

0x02 实现 TUI 交互界面

首先，在 AI Studio 里新建一个上下文窗口来了解一下 bubbletea 的设计，bubbletea 基于 Elm 架构，用于构建交互式应用。Gemini 给的比喻是数字时钟，Elm 架构的核心概念包含：

Model（模型）：时钟的状态。在任何时刻，它的唯一状态就是当前时间。
Update（更新）：时钟的逻辑。它只关心一件事：当接收到一个”滴答“事件时，如何根据旧的时间算出新的时间。
View（视图）：适中的外观。负责将来自 Model 的给定时间展示在屏幕上。

时钟永远处在一个循环中：滴答（Update）->新时间（Model）->重新显示（View）->等待下一个滴答事件…

下面是 Gemini 定义的一个包含基本输入框的 TUI 结构：

// model is the state of our TUI application.
type model struct {
	textarea textarea.Model
	content  string
	loading  bool
	err      error
}

// Update handles incoming messages and updates the model accordingly.
func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
	var cmds []tea.Cmd
	var cmd tea.Cmd

	switch msg := msg.(type) {
	case tea.KeyMsg:
		switch msg.Type {
		case tea.KeyCtrlC, tea.KeyEsc:
			return m, tea.Quit
			// Later, we will handle tea.KeyEnter to send the prompt.
		}
	}

	// Pass input to the textarea component.
	m.textarea, cmd = m.textarea.Update(msg)
	cmds = append(cmds, cmd)

	return m, tea.Batch(cmds...)
}

// View renders the UI based on the model's state.
func (m model) View() string {
	return fmt.Sprintf(
		"Ask GhOst a question:\n\n%s\n\n%s",
		m.textarea.View(),
		"(Ctrl+C to quit)",
	)
}

现在启动后会有一个如下的输入交互效果：

D:\Projects\my-project\GhOst git:[main]
go run main.go
Ask GhOst a question:

┃   1 Ask GhOst something...
┃
┃
┃
┃
┃

(Ctrl+C to quit)

0x03 实现多轮对话

核心改动：

在 TUI 的 model 中需要维护一个消息列表 []llm.Message，用来存放用户和 GhOst 之间的所有对话消息，并且加入 llm Client 和 Model：

// model is the state of our TUI application.
type model struct {
	llmClient *llm.Client
	llmModel  string
	messages  []llm.Message
	textarea  textarea.Model
	loading   bool
	err       error
}

在 Update 方法中添加回车交互并处理返回信息：

switch msg := msg.(type) {
case tea.KeyMsg:
	switch msg.Type {
	case tea.KeyCtrlC, tea.KeyEsc:
		return m, tea.Quit

	case tea.KeyEnter:
		if m.textarea.Value() != "" && !m.loading {
			m.loading = true
			m.messages = append(m.messages, llm.Message{Role: "user", Content: m.textarea.Value()})
			m.textarea.Reset()
			// Return the command to fetch the LLM response.
			return m, m.waitForLLMResponse
		}
	}

// Handle API response
case llmResponseMsg:
	m.loading = false
	m.messages = append(m.messages, llm.Message{Role: "assistant", Content: string(msg)})

// Handle API error
case errorMsg:
	m.loading = false
	m.err = msg.err
}

更新 View 视图，渲染多轮对话格式：

// Render conversation history
for _, msg := range m.messages {
	var roleStyle lipgloss.Style
	var roleText string

	if msg.Role == "user" {
		roleText = "You"
		roleStyle = lipgloss.NewStyle().Bold(true).Foreground(lipgloss.Color("70")) // Purple
	} else {
		roleText = "GhOst"
		roleStyle = lipgloss.NewStyle().Bold(true).Foreground(lipgloss.Color("66")) // Blue
	}

	b.WriteString(roleStyle.Render(roleText) + ":\n")
	b.WriteString(msg.Content + "\n\n")
}

0x04 交互组件优化

好吧我在尝试优化交互窗口时才意识到 Gemini Cli 本质上属于 REPL 而非 TUI。。。

Whatever，先使用 charmbracelet 提供的 bubbles Viewport 这个组件给聊天的历史提供一个友好的滚动窗口，先在 Model 中注入 viewport：

// The viewport will be initialized with the correct size via a WindowSizeMsg.  
vp := viewport.New(0, 0)  
return model{  
    llmClient: client,  
    llmModel:  modelName,  
    textarea:  ti,  
    viewport:  vp,  
    messages:  []llm.Message{},  
}

再更新 Update 中的回车事件：

case tea.KeyMsg:
	switch msg.Type {
	case tea.KeyCtrlC, tea.KeyEsc:
		return m, tea.Quit
	case tea.KeyEnter:
		// Get the value, but trim the newline that the component adds by default.
		prompt := strings.TrimSpace(m.textarea.Value())
		if prompt != "" && !m.loading {
			m.loading = true
			m.messages = append(m.messages, llm.Message{Role: "user", Content: prompt})
			m.textarea.Reset()
			m.viewport.SetContent(m.renderConversation())
			m.viewport.GotoBottom()
			return m, m.waitForLLMResponse
		}
	}
}

View 中使用 lipgloss 定义布局：

// View renders the UI based on the model's state.
func (m model) View() string {
	// lipgloss.JoinVertical arranges strings vertically.
	return lipgloss.JoinVertical(
		lipgloss.Left,
		m.viewport.View(),
		m.textarea.View(),
		m.helpView(),
	)
}

Markdown 渲染也直接使用 glamour 解决：

1
2
3

renderer, _ := glamour.NewTermRenderer(glamour.WithAutoStyle())
...
renderedContent, err := renderer.Render(msg.Content)

至此依赖 Charm 全家桶的强大能力，终端样式基本能看了。

0x05 流式处理对话

这块涉及的重构比较多，主要涉及交互方式的变化。首先添加 API 流式请求和返回的构造体：

// CompletionRequest is the request body for a non-streaming chat completion.
type CompletionRequest struct {
	Model    string    `json:"model"`
	Messages []Message `json:"messages"`
	Stream   bool      `json:"stream,omitempty"`
}

// StreamChoice is a single choice in a streaming chat completion response.
type StreamChoice struct {
	Delta struct {
		Content string `json:"content"`
	} `json:"delta"`
	FinishReason string `json:"finish_reason"`
}

// StreamCompletionResponse is the response body for a streaming chat completion.
type StreamCompletionResponse struct {
	Choices []StreamChoice `json:"choices"`
}

请求中的 stream 为 true 时服务器会以 SSE 形式返回消息。与原来的 CompletionResponse 不同，流式响应的每一块数据都包含一个 delta 对象，里面是当前收到的增量内容。

func (c *Client) CompletionStream(messages []Message, model string) tea.Cmd {
	return func() tea.Msg {
		// ...调用 API 并在 goroutine 中解析输出
		return Stream(ch)
	}
}

// Cmd is an IO operation that returns a message when it's complete. If it's
// nil it's considered a no-op. Use it for things like HTTP requests, timers,
// saving and loading from disk, and so on.
//
// Note that there's almost never a reason to use a command to send a message
// to another part of your program. That can almost always be done in the
// update function.
type Cmd func() Msg

// Msg contain data from the result of a IO operation. Msgs trigger the update
// function and, henceforth, the UI.
type Msg interface{}

// Stream is a channel of messages from the LLM stream.
type Stream chan tea.Msg

（修改必要请求参数改为流式调用后）在 channel 中通过 bufio.NewReader(resp.Body) 逐行读取解析 stream message：

ch := make(chan tea.Msg)

go func() {
	defer close(ch)
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		bodyBytes, _ := io.ReadAll(resp.Body)
		ch <- ErrorMsg{fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(bodyBytes))}
		return
	}

	ch <- StreamStartMsg{}

	reader := bufio.NewReader(resp.Body)
	for {
		line, err := reader.ReadBytes('\n')
		if err != nil {
			if err != io.EOF {
				ch <- ErrorMsg{fmt.Errorf("error reading stream: %w", err)}
			}
			ch <- StreamEndMsg{}
			return
		}

		lineStr := string(line)
		if !strings.HasPrefix(lineStr, "data: ") {
			continue
		}

		data := strings.TrimPrefix(lineStr, "data: ")
		data = strings.TrimSpace(data)

		if data == "[DONE]" {
			break
		}

		var streamResp StreamCompletionResponse
		if err := json.Unmarshal([]byte(data), &streamResp); err != nil {
			continue
		}

		if len(streamResp.Choices) > 0 {
			choice := streamResp.Choices[0]
			if choice.FinishReason == "stop" {
				break
			}
			ch <- StreamContentMsg{Content: choice.Delta.Content}
		}
	}
	ch <- StreamEndMsg{}
}()

对应调整 tui.go 中 Update 方法的返回：

// Enter 事件触发
if prompt != "" && !m.loading {
	m.messages = append(m.messages, llm.Message{Role: "user", Content: prompt})
	m.textarea.Reset()
	m.viewport.SetContent(m.renderConversation(true))
	m.viewport.GotoBottom()
	return m, m.llmClient.CompletionStream(m.messages, m.llmModel)
}

CompletionStream 函数返回一个命令，bubbletea 运行时会自动执行它。这个命令的执行结果就是那个 channel，TUI 接收到这个 channel 后，就会开始监听后续的 StreamContentMsg 等消息。

在新的模式下：

TUI -> 调用 client.CompletionStream() 返回 Cmd -> bubbletea 执行 Cmd -> [goroutine 开始工作]
goroutine -> 发送 StartMsg -> TUI 收到并准备
goroutine -> 发送 ContentMsg -> TUI 收到并追加显示
goroutine -> 发送 ContentMsg -> TUI 收到并追加显示
...
goroutine -> 发送 EndMsg -> TUI 收到并完成

至此实现了异步的流式消息更新。

0x06 写在后面

Vibe Coding 途中我也发现了 Crush 这个 Charm 团队已经开源的 TUI LLM Client，不过引用费曼那句被引用烂了的话：

What I cannot create, I do not understand.

一顿操作下来虽然都是被G老师带着走，但也对这类终端中的智能体交互形态有了一些更深入的体会。目前为止还只是初步的终端交互，后面准备继续填上工具调用的坑来完善智能体的形态。