Using AI Programming to Enhance Development Efficiency 2

Introduction

Readers may have noticed that compared to the previous blog post, the title of this post does not use quotation marks.

In early February, I had already used up a large portion of my GitHub Copilot quota. Hoping to save quota, and since Alibaba Cloud had gifted some free tokens, I decided to give Claude Code a try. I spent a few days reading the official documentation, focusing on permission management, security, and other aspects, then watched a few tutorials, and started experimenting. The result was "the higher the expectation, the greater the disappointment" — it was not useful at all.

Although Claude Code wasn't great, the VS Code release notes in mid-February mentioned agent orchestration, which caught my attention because I had previously read about agent teams and subagents in the Claude Code documentation. These three concepts actually refer to similar things, differing only in the level of autonomy. Over the past few weeks, I have used this feature multiple times and found it to be cost-effective and efficient, hence this blog post.

Claude Code Isn't useful

The reasons I mentioned in the introduction why Claude Code isn't useful are as follows:

First, I'm used to GitHub Copilot showing all the modifications after a conversation, so I can review them one by one and decide whether to accept or reject them. But in a Claude Code conversation, we either manually approve each modification by the agent, or we let the agent modify everything without any way to see exactly which files were changed. Although we could use Git, it's cumbersome and ultimately not as good as GitHub Copilot's built-in diff feature.

Second, the Anthropic API consumes too many tokens. Alibaba Cloud gives away 1M tokens for each new model, but even with the best model, it only lasts for two or three conversations, and the tokens are quickly used up. If we have to pay for tokens ourselves, it becomes very expensive. Therefore, most vendors adopt a subscription model (allowing a fixed number of uses per month/day), which is cost-effective for ordinary users. The only issue might be vendor lock-in, but since GitHub Copilot allows access to most mainstream models, this problem doesn't matter to me.

Third, some features of Claude Code do not work properly on Windows 10, such as the status line, because Windows' built-in PowerShell always has various issues.

Although these issues might not be entirely Claude Code's fault, when combined together, they are enough to make me not want to use it.

Claude Code Configuration

Although I didn't continue using Claude Code, I did research how to configure it to ensure privacy and security. My ~/.claude/settings.json file looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"env": {
"ANTHROPIC_BASE_URL": "https://dashscope.aliyuncs.com/apps/anthropic",
"ANTHROPIC_MODEL": "qwen3-coder",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "qwen3-coder",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "qwen3-coder",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "qwen3-coder",
"ANTHROPIC_AUTH_TOKEN": "your-api-key",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_ENABLE_PROMPT_SUGGESTION": "false",
"DISABLE_AUTOUPDATER": "1",
"DISABLE_TELEMETRY": "1",
"DISABLE_ERROR_REPORTING": "1",
"DISABLE_BUG_COMMAND": "1",
"CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY": "1",
"CLAUDE_CODE_IDE_SKIP_AUTO_INSTALL": "1"
},
"language": "Simplified Chinese",
"promptSuggestionEnabled": false,
"autoUpdatesChannel": "stable"
}

To use a third-party vendor's API, first set the API URL ANTHROPIC_BASE_URL, then set the API token ANTHROPIC_AUTH_TOKEN, and finally set ANTHROPIC_MODEL and ANTHROPIC_DEFAULT_<OPUS|SONNET|HAIKU>_MODEL to the desired model names.

The other configurations are for protecting privacy and security, avoiding unnecessary API calls, and disabling automatic updates.

Context Window

The reason I used up most of my GitHub Copilot quota in early February was that I had the AI solve some very complex problems. I designed the prompt like this:

1
2
3
4
5
6
You need to iteratively do the following:
- Create a new file, ending with the latest version number (v1, v2, ...), containing the modified code
- Run a local script to test the latest code
- Analyze the code bottlenecks based on the correctness and performance results from the script execution
- Based on the analysis, modify the code to improve performance
- If this round does not pass all tests, proceed to the next iteration. Keep iterating until all tests pass

After the AI executed fewer than ten rounds, the interface would show "Summarizing conversation history ...". One conversation executed up to 80 iterations, and this message appeared many times. The problem was that each time this message appeared, the AI seemed to become less intelligent, forgetting some very basic information. Ultimately, despite this recurring issue, thanks to the strong reasoning capabilities of GPT 5.2 and Gemini 3 Pro, Copilot successfully completed all the tasks I assigned. I didn't think much of the issue at the time.

Later, when I was reading the Claude Code documentation, I realized that this was due to the limitation of the Context Window. Each model supports a different context size. When the total number of tokens generated in the current conversation exceeds this limit, Claude Code (and similarly GitHub Copilot) automatically compresses the existing context to ensure subsequent conversations can continue. Although this compression claims to retain important information, it is automatic and likely not satisfactory.

The Claude Code documentation mentions that despite the high level of automation in AI programming, the context window is still the only precious resource that we need to manually maintain. We must do our best to avoid triggering automatic context compression (preferably never), ensuring that the AI always remembers all information. The best way to protect the context window is to use subagents.

The agent we interact with is also called the main agent, and we need to protect this agent's context window. Whenever the main agent creates a subagent, the new subagent's context window is empty. The main agent gives the subagent a prompt, the subagent does the work, and then returns the result to the main agent. In this process, performing specific work consumes the most tokens. By delegating this behavior to subagents, we can significantly reduce the main agent's token usage, thus protecting its context window.

Agent Orchestration

With the knowledge of context windows, the focus now is how to use subagents. This specific technique is called agent orchestration, or agent team. Let's take developing a feature as an example to explain the execution process of agent orchestration and the different roles agents take on.

  • The main agent acts as the conductor/orchestrator, not responsible for specific work, but only for orchestrating subagents so that different subagents can work together and in parallel.
  • The main agent first launches a planner subagent. This subagent reads the code, conceives a technical solution, and generates corresponding documentation.
  • The main agent then launches a coder subagent. This subagent implements specific functionality based on the technical solution document.
  • Whenever a coder completes its work, the main agent launches a reviewer subagent. This subagent reviews the code implemented by the coder based on the technical solution document. If there are issues, the main agent launches another coder to modify, then launches a reviewer, repeating until the feature is successfully implemented.

For complex requirements, if the model is powerful enough, it can orchestrate multiple sub-tasks simultaneously, launching multiple coders + reviewers in parallel, accelerating development.

Prompting

Like other LLM techniques, implementing agent orchestration ultimately comes down to writing prompts. The recommended approach by Claude Code and GitHub Copilot is to customize multiple agents, each taking on a different role. Each customized agent corresponds to a markdown file, with the main content being the system prompt, and some configuration can be done via front matter.

But I didn't do that, partly because of the emergence of two more powerful models, Sonnet 4.6 and Opus 4.6. In my opinion, as long as the model is powerful enough, a generic subagent combined with the prompt given by the main agent can achieve equally good results. Additionally, I worry that my custom agent's system prompts might not be well-written and could be inferior to the existing system prompts (which belong to the model vendor and are not visible to users). Therefore, my prompts now look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
We need to implement xxx.

You need to do the following:

- Launch a subagent to understand my prompt holistically, read existing documentation, read relevant backend and frontend code, generate technical documentation (placed in the `docs/feat` directory), and formulate an overall orchestration plan.
- Note that I may not have fully covered all places in the existing code that need modification, so you need to read the code comprehensively and identify all necessary changes.
- This subagent should actively think about multiple possibilities.
- If there are unclear situations, you can ask me questions. If you ask me questions, you need to have a subagent re-review the existing technical solution and modify it holistically. Note that you cannot modify it yourself; you need to launch a subagent to modify it.
- Launch a subagent to modify the backend business code.
- Launch a subagent to review the modified code. If there are issues, launch a subagent to continue fixing, then launch a subagent to continue reviewing. Repeat until no issues remain.
- Launch a subagent to modify the backend test code.
- Launch a subagent to review the modified code. If there are issues, launch a subagent to continue fixing, then launch a subagent to continue reviewing. Repeat until no issues remain.
- Launch a subagent to run backend unit tests and integration tests. If there are issues, launch a subagent to fix, then launch a subagent to review the fixed code, then run the test command again.
- Launch a subagent to modify the frontend code.
- Launch a subagent to review the modified code. If there are issues, launch a subagent to fix, then launch a subagent to review the fixed code, then run the test command again.
- Launch a subagent to read relevant backend and frontend code, confirm that the implementation is correct and complete. If there are issues, launch a subagent to fix, then launch a subagent to review the fixed code, then run the test command again.

Note: You are only responsible for orchestrating tasks and subagents. You cannot write code, extensively read existing code, extensively read documentation, or execute test commands (these tasks are all delegated to subagents). Both you and any subagent can only modify code manually; you cannot modify code files by executing commands.

First, by using agent orchestration, the code generated in a single conversation has far fewer bugs, thus improving completion. This is mainly due to the reviewer's double-check mechanism. Of course, Claude's new models also deserve some credit.

Second, my prompt is still imperative, rather than letting the agent orchestrate automatically. The direct reason is that I haven't customized agents with different roles, and the fundamental reasons are (1) I don't find automatic orchestration reliable, and (2) I don't want to waste quota.

Finally, although this prompt works well enough, it has some issues. The main issue is that the first subagent is responsible for both generating technical documentation and formulating the orchestration plan. There is a coupling of capabilities here; it should actually be split into two subagents. However, the main agent's ability to adjust the orchestration plan in real-time based on results returned by subsequent subagents is poor, leading to a static orchestration plan formulated at the beginning, with specific instructions derived from the technical documentation generated by this subagent, thus inevitably creating coupling. Additionally, the generated technical documentation lacks review and secondary modification, but solving this problem is constrained by the previous issue.

Other Insights

The key to automating AI programming is enabling the AI to verify its own work, such as by executing test scripts. Only then can the AI iterate repeatedly and ultimately deliver usable code to us. Therefore, for me, AI programming can be divided into the following stages:

  • "Traditional" programming: not using AI tools at all;
  • "Traditional" AI programming: programming with hints from IDE AI plugins, i.e., "tab engineers";
  • "Traditional" LLM: chatting with web-based LLMs, copying/pasting the code provided, essentially still "CV engineers";
  • "Traditional" agent: letting the agent write code, then the user verifies it themselves;
  • Current agent: letting the agent write code and verify the results, with the user accepting;
  • Future agent: the high level of automation brought by agent teams, differing from current agents in better understanding user prompts, so users don't need to write many prompts and don't have to worry about the AI not following instructions. Although it's called "future agent", I estimate most vendors will support it this year.

References