建设 AI 原生工程团队

AI models 正在快速扩大自己能完成的任务范围，这对 engineering 有直接影响。

Introduction

AI models 正在快速扩大自己能完成的任务范围，这对 engineering 有直接影响。

frontier systems 已经可以维持多小时 reasoning：截至 August 2025，METR 发现 leading models 能连续完成 2 hours and 17 minutes 的工作，并且大约有 50% confidence 产出正确答案。

这种能力还在快速提升，task length 大约每七个月翻倍。几年前，models 只能处理约 30 seconds 的 reasoning，足够做小型 code suggestions。今天，随着 models 能维持更长 reasoning chain，整个 software development lifecycle 都有机会获得 AI assistance；coding agents 可以有效参与 planning、design、development、testing、code reviews 和 deployment。

这篇 guide 会用真实例子说明 AI agents 如何参与 software development lifecycle，并给 engineering leaders 一套可以立即开始的做法，用来建设 AI-native teams 和 processes。

AI Coding: From Autocomplete to Agents

AI coding tools 已经远远超出最早的 autocomplete assistants。

早期工具主要处理快速任务，例如建议下一行代码，或补全 function templates。随着 models 的 reasoning ability 增强，developers 开始在 IDE chat interface 中和 agents 互动，用它们做 pair programming 和 code exploration。

今天的 coding agents 可以生成完整 files、scaffold new projects，并把 designs 转成 code。它们也能处理 debugging、refactoring 这类 multi-step problems。agent execution 也正在从单个 developer 的机器，转向 cloud-based、multi-agent environments。

这正在改变 developers 的工作方式：他们会减少在 IDE 内和 agent 一起生成代码的时间，把更多完整 workflows 交给 agent。

Capability	What It Enables
Unified context across systems	同一个 model 可以读取 code、configuration 和 telemetry，在多个 layers 之间保持一致 reasoning；过去这些 layers 往往需要不同工具分别处理。
Structured tool execution	Models 现在可以直接调用 compilers、test runners 和 scanners，产出可验证结果，而不是停留在 static suggestions。
Persistent project memory	Long context windows 和 compaction 这类技术让 models 能从 proposal 跟到 deployment，并记住此前的 design choices 和 constraints。
Evaluation loops	Model outputs 可以自动对照 benchmarks 测试，例如 unit tests、latency targets、style guides，让改进建立在 measurable quality 上。

OpenAI 已经亲眼见到这种变化。development cycles 正在加速，过去需要 weeks 的工作，现在可以在 days 内完成。teams 更容易跨 domains 移动，更快 onboard 到陌生 projects，也能在组织内以更高 agility 和 autonomy 运行。

许多 routine、time-consuming tasks 已经完全交给 Codex，例如 documenting new code、surfacing relevant tests、maintaining dependencies、cleaning up feature flags。

但 engineering 的一些核心部分并没有改变。代码的真正 ownership，尤其是 new 或 ambiguous problems，仍然属于 engineers；某些挑战也超出当前 models 的能力范围。

有了 Codex 这类 coding agents，engineers 可以把更多时间放在 complex 和 novel challenges 上，专注 design、architecture 和 system-level reasoning，而不是 debugging 或 rote implementation。

下面按 SDLC 的各个阶段拆解 coding agents 会带来的变化，并说明团队可以采取哪些具体步骤，开始以 AI-native engineering org 的方式工作。

1. Plan

组织里的很多 teams 都依赖 engineers 来判断一个 feature 是否可行、需要多久、会涉及哪些 systems 或 teams。

任何人都可以起草 specification，但要形成准确 plan，通常需要深入理解 codebase，并和 engineering 做多轮 iteration：挖 requirements、澄清 edge cases，并对齐什么在技术上 realistic。

How Coding Agents Help

AI coding agents 可以在 planning 和 scoping 阶段立即提供 code-aware insights。

例如，teams 可以构建 workflow，把 coding agents 接到 issue-tracking systems，让 agent 读取 feature specification，对照 codebase 做 cross-reference，然后 flag ambiguities、把工作拆成 subcomponents，或估计 difficulty。

coding agents 也可以立刻追踪 code paths，展示某个 feature 会涉及哪些 services。过去这类工作可能需要 hours 或 days，在大型 codebase 中手动查找。

What Engineers Do Instead

agents 会把过去需要 meetings 才能收集的 context 提前暴露出来，让 teams 把更多时间放在 core feature work 上。

关键 implementation details、dependencies 和 edge cases 会在前期被识别出来，因此 decisions 更快，meetings 更少。

Delegate	Review	Own
AI agents 可以先做 feasibility 和 architectural analysis。它们读取 specification，把它映射到 codebase，识别 dependencies，并提出需要 clarification 的 ambiguities 或 edge cases。	Teams review agent findings，验证准确性、完整性，并确认 estimates 反映真实 technical constraints。story point assignment、effort sizing 和 non-obvious risks 仍需要 human judgment。	strategic decisions 仍由人主导，例如 prioritization、long-term direction、sequencing 和 tradeoffs。team 可以让 agent 给 options 或 next steps，但 planning 和 product direction 的最终责任仍在组织。

Getting Started Checklist

识别那些需要在 features 和 source code 之间对齐的 common processes。常见场景包括 feature scoping 和 ticket creation。
从 basic workflows 开始，例如 tagging、deduplicating issues 或 feature requests。
再考虑 advanced workflows，例如根据 initial feature description 给 ticket 添加 sub-tasks，或当 ticket 进入某个 stage 时启动 agent run，补充更详细描述。

2. Design

design phase 经常被基础 setup work 拖慢。

teams 会花大量时间 wiring boilerplate、integrating design systems、refining UI components 或 flows。

mockups 和 implementation 之间的 misalignment 会带来 rework 和长反馈周期；而有限的 bandwidth 也会推迟设计验证，因为团队没有时间探索 alternatives 或适应 changing requirements。

How Coding Agents Help

AI coding tools 可以显著加速 prototyping：scaffold boilerplate code、搭建 project structures，并立即实现 design tokens 或 style guides。

engineers 可以用 natural language 描述 desired features 或 UI layouts，得到符合团队 conventions 的 prototype code 或 component stubs。

它们还可以把 designs 直接转换成 code，建议 accessibility improvements，甚至分析 codebase 中的 user flows 或 edge cases。

这让团队可以在 hours 而不是 days 内迭代多个 prototypes，并在早期就做 high fidelity prototype，为 decision-making 提供更清晰依据，也让 customer testing 更早发生。

What Engineers Do Instead

当 routine setup 和 translation tasks 由 agents 处理后，teams 可以把注意力转移到更高杠杆的工作上。

engineers 专注于 refining core logic、建立 scalable architectural patterns，并确保 components 符合 quality 和 reliability standards。

designers 可以把更多时间用来评估 user flows、探索 alternative concepts。协作重心从 implementation overhead 转向提升底层 product experience。

Delegate	Review	Own
Agents 负责 initial implementation work，例如 scaffolding projects、generating boilerplate code、translating mockups into components，并应用 design tokens 或 style guides。	team review agent output，确认 components 符合 design conventions、quality 和 accessibility standards，并能正确集成 existing systems。	team 负责整体 design system、UX patterns、architectural decisions，以及 user experience 的最终方向。

Getting Started Checklist

使用同时接受 text 和 image input 的 multi-modal coding agent。
通过 MCP 把 design tools 接入 coding agents。
用 MCP programmatically expose component libraries，并把它们接入 coding model。
构建把 designs -> components -> components implementation 串起来的 workflows。
使用 typed languages，例如 Typescript，为 agent 定义 valid props 和 subcomponents。

3. Build

build phase 是 teams 最容易感到 friction 的阶段，也是 coding agents 影响最明显的阶段。

engineers 会花大量时间把 specs 转成 code structures，把 services 连接起来，在 codebase 中复制 patterns，并填充 boilerplate。即使很小的 features，也可能需要 hours of busy-work。

systems 越大，这种 friction 越会叠加。large monorepos 会累积 patterns、conventions 和 historical quirks，拖慢 contributors。

engineers 有时花在重新发现“正确做法”上的时间，和真正实现 feature 的时间一样多。持续在 specs、code search、build errors、test failures、dependency management 之间切换，也会增加 cognitive load。long-running tasks 中的 interruptions 还会打断 flow，进一步延迟 delivery。

How Coding Agents Help

运行在 IDE 和 CLI 中的 coding agents 可以处理更大、更复杂的 multi-step implementation tasks，从而加速 build phase。

它们不只是补下一个 function 或 file，而是可以在一次 coordinated run 中产出 end-to-end feature：data models、APIs、UI components、tests 和 documentation。

借助贯穿整个 codebase 的 sustained reasoning，它们可以处理过去需要 engineers 手动追踪 code paths 才能完成的 decisions。

long-running tasks 中，agents 可以：

根据 written spec 起草完整 feature implementation。
在 dozens of files 中搜索和修改 code，同时保持 consistency。
生成符合 conventions 的 boilerplate，例如 error handling、telemetry、security wrappers 或 style patterns。
在 build errors 出现时直接修复，而不是等待 human intervention。
在同一个 workflow 中同步写 tests 和 implementation。
产出 diff-ready changesets，遵循 internal guidelines，并包含 PR messages。

实践中，大量机械性的 “build work” 会从 engineers 转移给 agents。agent 变成 first-pass implementer；engineer 变成 reviewer、editor 和方向来源。

What Engineers Do Instead

当 agents 能可靠执行 multi-step build tasks 时，engineers 会把注意力转向 higher-order work：

在 implementation 前澄清 product behavior、edge cases 和 specs。
review AI-generated code 的 architectural implications，而不是做 rote wiring。
打磨需要 deep domain reasoning 的 business logic 和 performance-critical paths。
设计能引导 agent-generated code 的 patterns、guardrails 和 conventions。
和 PM、design 协作迭代 feature intent，而不是 boilerplate。

engineers 不再把主要精力放在把 feature spec “翻译”成 code，而是专注 correctness、coherence、maintainability 和 long-term quality。这些仍然是 human context 最重要的地方。

Delegate	Review	Own
Agents 为 well-specified features 起草 first implementation pass，包括 scaffolding、CRUD logic、wiring、refactors 和 tests。随着 long-running reasoning 提升，这会越来越多地覆盖完整 end-to-end builds，而不只是 isolated snippets。	Engineers 评估 design choices、performance、security、migration risk 和 domain alignment，并修正 agent 可能漏掉的细微问题。engineers 的工作变成塑形和 refine AI-generated code，而不是执行机械性实现。	Engineers 继续负责需要 deep system intuition 的工作：new abstractions、cross-cutting architectural changes、ambiguous product requirements 和 long-term maintainability trade-offs。随着 agents 承担更长任务，engineering 会从 line-by-line implementation 转向 iterative oversight。

Example:

Cloudwalk 的 engineers、PMs、designers 和 operators 每天使用 Codex，把 specs 转成 working code。无论他们需要 script、新 fraud rule，还是一个几分钟内交付的 full microservice，Codex 都能减少 build phase 中的 busy work，让每个员工都能以很快速度实现 ideas。

Getting Started Checklist

从 well specified tasks 开始。
让 agent 通过 MCP 使用 planning tool，或让它写一个会 commit 到 codebase 的 PLAN.md file。
检查 agent 尝试执行的 commands 是否成功。
迭代 AGENTS.md file，让它解锁 agentic loops，例如运行 tests 和 linters 来获得 feedback。

4. Test

developers 经常很难保证足够 test coverage，因为编写和维护 comprehensive tests 需要时间、context switching，以及对 edge cases 的深刻理解。

teams 经常需要在快速推进和写 thorough tests 之间做 trade-offs。deadline 临近时，test coverage 往往最先被牺牲。

即使 tests 已经写好，随着 code 演进，持续更新 tests 也会带来 ongoing friction。tests 可能变得 brittle、失败原因不清晰，也可能随着底层 product 变化需要大规模 refactors。

high quality tests 能让 teams 更快、更有信心地发布。

How Coding Agents Help

AI coding tools 可以用几种方式帮助 developers 写出更好的 tests。

第一，它们可以读取 requirements document 和 feature code 逻辑，建议 test cases。

Models 在提出 edge cases 和 failure modes 上有时非常有效，尤其是 developer 深入 feature 后需要 second opinion 时，这些细节很容易被忽略。

此外，models 可以帮助 tests 随 code 演进保持 up to date，降低 refactoring friction，避免 stale tests 变成 flaky tests。

通过处理 test writing 中的基础 implementation details，并主动暴露 edge cases，coding agents 能加快 tests 开发过程。

What Engineers Do Instead

用 AI tools 写 tests 并不意味着 developers 不需要思考 testing。

事实上，agents 降低生成 code 的门槛后，tests 作为 application functionality source of truth 的作用会越来越重要。

因为 agents 可以运行 test suite，并根据 output 迭代，所以定义 high quality tests 往往是允许 agent 构建 feature 的第一步。

developers 会更多关注 test coverage 的 high level patterns，补充和挑战 model 对 test cases 的识别。

test writing 变快后，developers 不仅能更快 ship features，也能承担更 ambitious features。

Delegate	Review	Own
Engineers 会把基于 feature specifications 生成 test cases 的 initial pass 交给 agents，也会用 model 先生成 tests。让 model 在独立 session 中生成 tests，而不是和 feature implementation 混在一起，通常很有帮助。	Engineers 仍必须 thoroughly review model-generated tests，确认 model 没有走捷径或写 stubbed tests。engineers 还要确保 agents 能运行 tests：有适当 permissions，且知道不同 test suites 的上下文。	Engineers 负责让 test coverage 对齐 feature specifications 和 user experience expectations。adversarial thinking、edge cases mapping 的创造力，以及对 tests intent 的关注，仍然是关键能力。

Getting Started Checklist

指导 model 把 tests 作为单独步骤实现，并验证 new tests 在进入 feature implementation 前会失败。
在 AGENTS.md file 中设置 test coverage guidelines。
给 agent 提供它可以调用的 code coverage tools 示例，用来理解 test coverage。

5. Review

developers 平均每周会花 2-5 hours 做 code reviews。

teams 经常在两种选择之间摇摆：为 deep review 投入大量时间，或对看起来很小的 changes 做一个 “good enough” pass。

如果优先级判断不准，bugs 会进入 production，影响 users，并造成大量 rework。

How Coding Agents Help

coding agents 可以扩展 code review process，让每个 PR 都获得一致的 baseline attention。

传统 static analysis tools 通常依赖 pattern matching 和 rule-based checks，而 AI reviewers 可以实际执行部分 code，理解 runtime behavior，并跨 files 和 services 追踪 logic。

不过，要让它真正有效，models 必须专门训练来识别 P0 和 P1-level bugs，并调校成提供 concise、high-signal feedback。过度冗长的 responses 和 noisy lint warnings 一样容易被忽略。

What Engineers Do Instead

OpenAI 的经验是，AI code review 会让 engineers 更有信心，避免把 major bugs 发布到 production。

code review 经常能抓到 contributor 可以在拉另一位 engineer 进来前先修掉的问题。

code review 不一定会让 pull request process 更快，尤其当它发现 meaningful bugs 时；但它会减少 defects 和 outages。

Delegate vs Review vs Own

即使用 AI code review，engineers 仍负责确认代码已经 ready to ship。

实际操作上，这意味着 engineers 仍要阅读并理解 change 的 implications。

engineers 可以把 initial code review 交给 agent，但最终 review 和 merge process 仍归 engineers 负责。

Delegate	Review	Own
Engineers 把 initial coding review 交给 agents。在 pull request 标记为 ready for teammate review 前，这可能发生多次。	Engineers 仍然 review pull requests，但重点更多放在 architectural alignment：是否实现了 composable patterns，是否使用正确 conventions，functionality 是否匹配 requirements。	Engineers 最终负责部署到 production 的 code；他们必须确保代码可靠运行，并满足预期 requirements。

Example:

Sansan 使用 Codex review race conditions 和 database relations，这些往往是 humans 容易忽略的问题。Codex 也能抓到 improper hard-coding，甚至提前预判 future scalability concerns。

Getting Started Checklist

筛选 engineers 已完成的 gold-standard PRs，包括 code changes 和 comments，保存为 evaluation set，用来衡量不同 tools。
选择专门针对 code review 训练过的 model。OpenAI 发现 generalized models 经常 nitpick，signal-to-noise ratio 很低。
定义团队如何衡量 reviews 是否 high quality。推荐跟踪 PR comment reactions，作为低摩擦方式标记 good 和 bad reviews。
从小范围开始，但一旦对 review 结果有信心，就快速 rollout。

6. Document

大多数 engineering teams 都知道自己的 documentation 落后，但补齐成本很高。

critical knowledge 经常掌握在个人手中，而不是记录在 searchable knowledge bases 中。已有 docs 也很快过期，因为更新它们会把 engineers 从 product work 中拉走。

即使 teams 做 documentation sprints，结果通常也只是一次性 effort；系统一演进，这些 docs 又开始衰减。

How Coding Agents Help

coding agents 非常擅长通过读取 codebases 来 summarize functionality。

它们不仅能说明 codebase 的某些部分如何工作，也能用 mermaid 这类 syntaxes 生成 system diagrams。

当 developers 用 agents 构建 features 时，也可以直接 prompt model 更新 documentation。借助 AGENTS.md，更新 documentation 的 instructions 可以自动包含在每次 prompt 中，从而提高一致性。

由于 coding agents 可以通过 SDKs programmatically 运行，它们也能纳入 release workflows。

例如，可以让 coding agent review 本次 release 包含的 commits，并 summarize key changes。

这样 documentation 会成为 delivery pipeline 的内建部分：更快产出，更容易保持 current，也不再依赖某个人“找时间”。

What Engineers Do Instead

engineers 会从手写每一份 doc，转向设计和监督这个系统。

他们决定 docs 如何组织，补充 decisions 背后的关键 “why”，为 agents 设置清晰 standards 和 templates，并 review critical 或 customer-facing pieces。

他们的工作变成确保 documentation 结构清晰、准确，并接入 delivery process，而不是自己完成所有 typing。

Delegate	Review	Own
把 low-risk、repetitive work 完全交给 Codex，例如 files 和 modules 的 first-pass summaries、inputs 和 outputs 的基础描述、dependency lists，以及 pull-request changes 的 short summaries。	Engineers review 和 edit Codex 起草的重要 docs，例如 core services overviews、public API and SDK docs、runbooks、architecture pages，然后再 publish。	Engineers 仍负责整体 documentation strategy 和 structure、agent 遵循的 standards 和 templates，以及所有 external-facing 或 safety-critical documentation，尤其涉及 legal、regulatory 或 brand risk 的内容。

Getting Started Checklist

通过 prompt coding agent 实验 documentation generation。
把 documentation guidelines 写入 AGENTS.md。
找出可以自动生成 documentation 的 workflows，例如 release cycles。
review generated content 的 quality、correctness 和 focus。

7. Deploy and Maintain

理解 application logging 对 software reliability 至关重要。

incident 发生时，software engineers 会参考 logging tools、code deploys 和 infrastructure changes 来定位 root cause。

这个过程常常出人意料地 manual，需要 developers 在不同 systems 之间来回切换。在 incidents 这类高压场景下，这会消耗关键时间。

How Coding Agents Help

借助 AI coding tools，你可以通过 MCP servers 提供 logging tools access，并同时提供 codebase context。

这样 developers 就可以在同一个 workflow 中 prompt model 查看某个 endpoint 的 errors；model 再利用这些 context 遍历 codebase，找到相关 bugs 或 performance issues。

coding agents 也可以使用 command line tools，因此能查看 git history，识别可能导致 log traces 中问题的具体 changes。

What Engineers Do Instead

AI 可以自动化 log analysis 和 incident triage 中繁琐的部分，让 engineers 把注意力放在更高层的 troubleshooting 和 system improvement 上。

engineers 不再手动关联 logs、commits 和 infrastructure changes，而是专注验证 AI-generated root causes、设计 resilient fixes，并制定 preventative measures。

这种转变会减少 reactive firefighting 的时间，让 teams 把更多精力投入 proactive reliability engineering 和 architectural improvements。

Delegate	Review	Own
很多 operational tasks 可以交给 agents：parsing logs、surfacing anomalous metrics、identifying suspect code changes，甚至 proposing hotfixes。	Engineers vet 和 refine AI-generated diagnostics，确认准确性，并 approve remediation steps。他们确保 fixes 满足 reliability、security 和 compliance standards。	critical decisions 仍由 engineers 负责，尤其是 novel incidents、sensitive production changes，或 model confidence 较低的情况。humans 仍对 judgment 和 final sign-off 负责。

Example:

Virgin Atlantic 使用 Codex 强化 teams deploy 和 maintain systems 的方式。

Codex VS Code Extension 让 engineers 能在同一个地方调查 logs、跨 code 和 data 追踪 issues，并通过 Azure DevOps MCP 和 Databricks Managed MCPs review changes。

通过把 operational context 统一到 IDE 内，Codex 加快 root cause discovery，减少 manual triage，并帮助 teams 专注于 validating fixes 和 improving system reliability。

Getting Started Checklist

Connect AI tools to logging and deployment systems：把 Codex CLI 或类似工具接入你的 MCP servers 和 log aggregators。
Define access scopes and permissions：确保 agents 能访问相关 logs、code repositories 和 deployment histories，同时保持 security best practices。
Configure prompt templates：为常见 operational queries 创建 reusable prompts，例如 “Investigate errors for endpoint X” 或 “Analyze log spikes post-deploy.”
Test the workflow：运行 simulated incident scenarios，确认 AI 能找出正确 context，准确追踪 code，并提出 actionable diagnostics。
Iterate and improve：从真实 incidents 收集 feedback，调整 prompt strategies，并随着 systems 和 processes 演进扩展 agent capabilities。

Conclusion

coding agents 正在改变 software development lifecycle。它们接手过去拖慢 engineering teams 的 mechanical、multi-step work。

借助 sustained reasoning、unified codebase context，以及执行真实 tools 的能力，这些 agents 已经能处理 scoping、prototyping、implementation、testing、review，甚至 operational triage。

engineers 仍然牢牢掌握 architecture、product intent 和 quality；但 coding agents 会越来越多地成为每个 SDLC phase 中的 first-pass implementer 和 continuous collaborator。

这种转变不要求 radical overhaul。小而明确的 workflows 会随着 coding agents 能力和可靠性的提升快速 compound。

从 well-scoped tasks 开始，投入 guardrails，并逐步扩大 agent responsibility 的 teams，会在 speed、consistency 和 developer focus 上看到明显收益。

如果你正在探索 coding agents 如何加速你的 organization，或准备第一次 deployment，可以联系 OpenAI。

OpenAI 可以帮助你把 coding agents 变成真实 leverage：设计贯穿 planning、design、build、test、review 和 operations 的 end-to-end workflows，并帮助团队采用 production-ready patterns，让 AI-native engineering 落地。

On this page