淘系前端团队

VS Code 源码分析 - 多语言实现

作者：发布于：

前言

传统前端 App 多语言最简单的实现可以由一套响应式数据流管理系统来托管多语言文案，切换语言时通过数据流的变化使得界面根据文案重新渲染。但由于 VS Code 架构的复杂性，需要有一套能兼容 Electron 渲染窗口(Chromium)及 Node.js 进程的多语言方案。

VS Code 实现

我们从源码开始来一步一步了解 VS Code 是如何基于语言包插件实现多语言的。

NLS

VS Code 主进程的入口是 src/main.js，我们重点关注第63行

if (locale) {
	nlsConfigurationPromise = lp.getNLSConfiguration(product.commit, userDataPath, metaDataFile, locale);
}

这里使用 lp.getNLSConfiguration 创建了一个 nlsConfigurationPromise 对象，当 Electron 窗口 onReady 事件触发时，会将初始化完的 nlsConfig 设置到 VSCODE_NLS_CONFIG 环境变量中。

其中 nls 是指 Native Language Support，根据变量名可以得知这是一个获取多语言配置的 Promise 对象，其入参为 product.commit、userDataPath、metaDataFile 以及表示当前用户设置语言的 locale。前三个参数非常重要，因为它涉及到多语言实现的核心细节，我们一个一个来解释他们的作用。

product.commit

product product.json 文件，在 VS Code 编译打包后会补充一些字段，其中包括当前版本的代码 commit 号。那么为什么多语言的配置要依赖代码具体版本的 commit 号呢？简单来说这是由于 VS Code 的语言包为官方维护的一个插件 vscode-loc，在每次 VS Code 新版本 release 发布后一同发布到插件市场。为了区分不同版本的语言文案，所以每个 release 版本的 VS Code 版本都对应相同版本号的语言包插件。实际上问题依然存在，为什么语言包要跟随软件版本一起 release ？理论上语言包只是一堆文案，和软件本身分开单独维护，社区可以随时贡献翻译不是更好吗？这里先卖个关子我们后面再说。

userDataPath

userDataPath 很容易理解，这是 VS Code 的用户数据目录，不同操作系统下的路径不一样

 MacOS
~/Library/Application Support/Code

Linux
~/.config

Windows
$(USERPROFILE)/AppData/Roaming

metaDataFile

metaDataFile 是一个名为 nls.metadata.json 的文件。

const metaDataFile = path.join(__dirname, 'nls.metadata.json');

查看 VS Code 源码会发现这个文件并不存在，实际上只有在完整编译打包一遍 VS Code 后才会生成这个文件，这个文件的大致内容长这样

{
    "keys": {
        "vs/code/electron-browser/processExplorer/processExplorerMain": [
			"cpu",
			"memory",
			"pid",
			"name",
			"killProcess",
			"forceKillProcess",
			"copy",
			"copyAll",
			"debug"
		]
    },
    "messages": {
        "vs/code/electron-browser/processExplorer/processExplorerMain": [
			"CPU %",
			"Memory (MB)",
			"pid",
			"Name",
			"Kill Process",
			"Force Kill Process",
			"Copy",
			"Copy All",
			"Debug"
		]
    },
    "bundles": {
        "vs/code/electron-browser/processExplorer/processExplorerMain": [
			"vs/code/electron-browser/processExplorer/processExplorerMain"
		]
    }
}

主要包含了 keys、messages、bundles 三个对象，其中 keys 里是每个源码文件中语言文案的 key 名，messages 则是每个源码文件中语言文案的默认值，bundles 比较奇怪，我们一会再说。
这里很明显可以看出 keys 里的键名对应到了 messages 里的文案，我们把它们合并成一个对象

{
    ”vs/code/electron-browser/processExplorer/processExplorerMain“： {
        "cpu": "CPU %",
        "memory": "Memory (MB)",
        //...
    }
}

如果好奇点开上文中 vscode-loc 插件并且看了语言包文件的话会发现这就是语言包文件的结构。

{
    "vs/code/electron-browser/processExplorer/processExplorerMain": {
        "cpu": "CPU %",
        "memory": "内存 (MB)"
        // ...
    }
}

到这里似乎有了一点头绪，语言包文件的文案内容对应到了 VS Code 源码的具体文件，根据文件相对路径分为一个一个的 namespace ，在编译时分析了所有包含多语言调用的文件并生成了这个 nls.metadata.json 文件。

bundles 里记录的实际为不同模块的入口，它定义在 gulpfile.vscode.js。在打包时会以这些文件为入口，逐个解析 AST ，根据 import 节点的引用一层一层寻找代码中包含 import * as nls from 'vs/nls' 以及 nls.localize 调用的文件并记录下来。之后根据这些文件再逐个解析 AST 记录所有 nls.localize 调用的 key 值以及默认值 message 分别记录到前文中的 keys 以及 messages 中。

vs/nls

我们再来看一下 vs/nls 模块是什么，源码在这里，这个看起来像编译后代码的实际源码在 vscode-loader 项目中。简单看一下代码会发现它是一个 vscode-loader 的插件，vscode-loader 是 VS Code 实现的一个 AMD 规范的异步模块加载器，由于篇幅限制这里不详细叙述它的具体原理。我们只需要了解的是当 import * as nls from 'vs/nls' 时实际加载了一个 nlsPlugin 对象，而 nls.localize 是它的一个属性，粗看它的实现似乎只是负责格式化一下文案及参数原样返回。

this.localize = (data, message, ...args: (string | number | boolean | undefined | null)[]) => localize(this._env, data, message, ...args);

function localize(env: Environment, data, message, ...args: (string | number | boolean | undefined | null)[]) {
    return _format(message, args, env);
}

function _format(message: string, args: (string | number | boolean | undefined | null)[], env: Environment): string {
    let result: string;

    if (args.length === 0) {
        result = message;
    } else {
        result = message.replace(/\{(\d+)\}/g, (match, rest) => {
            let index = rest[0];
            let arg = args[index];
            let result = match;
            if (typeof arg === 'string') {
                result = arg;
            } else if (typeof arg === 'number' || typeof arg === 'boolean' || arg === void 0 || arg === null) {
                result = String(arg);
            }
            return result;
        });
    }
    if (env.isPseudo) {
        // FF3B and FF3D is the Unicode zenkaku representation for [ and ]
        result = '\uFF3B' + result.replace(/[aouei]/g, '$&$&') + '\uFF3D';
    }

    return result;
}

上文我们说 VS Code 的多语言对应到源码具体文件，实际上这个说法还不够准确，多语言精确对应到源码中调用 nls.localize 的具体顺序，考虑这段代码

// vs/path/to/code.ts
import * as nls from 'vs/nls';

const value = nls.localize('key1', 'Message1');
const value2 = nls.localize('key2', 'Message2');

在生成的 nls.matada.json 中应该是

{
    "keys": {
        "vs/path/to/code": [
            "key1",
            "key2"
        ]
    },
    "messages": {
        "vs/path/to/code": [
            "Message1",
            "Message2"
        ]
    }
}

nls.localize 的用法是 nls.localize(key, defaultMesage, [...args])，参数中没有任何代码路径的信息，nls 模块如何知道调用的是具体哪个文件的 key 呢？

以前文中的 vs/code/electron-browser/processExplorer/processExplorerMain 为例，再运行 gulp vscode-linux-x64 --old-space-max-size=10240 来看一下源码完整编译后变成了什么。

// vs/code/electron-browser/processExplorer/processExplorerMain 编译后
var __m = ["exports","require",/*"..."*/"vs/nls!vs/code/electron-browser/processExplorer/processExplorerMain"];

// ...

const tableHead = document.createElement('thead');
    tableHead.innerHTML = `<tr>
    <th scope="col" class="cpu">${nls_1.localize(0, null)}</th>
    <th scope="col" class="memory">${nls_1.localize(1, null)}</th>
    <th scope="col" class="pid">${nls_1.localize(2, null)}</th>
    <th scope="col" class="nameLabel">${nls_1.localize(3, null)}</th>
</tr>`;

对比源代码会发现两处不一样的地方，首先 __m 记录了代码中所有引用的模块名，其中 vs/nls 后面多了一个 ! 以及当前文件相对路径。其次代码中的 nls.localize(key, message) 变成了 nls_1.localize(index, args)，没有了 key 也没有 defaultMessage。前文中所说的多语言精确对应到源码中调用 nls.localize 的具体顺序，这里看就很清楚了，这段编译后的代码中 localize 的参数 0、1、2、即表示调用的顺序，那么理论上在 localize 内部执行时应该从类似 nls.metadata.json 中 messages.vs/code/electron-browser/processExplorer/processExplorerMain 数组中获取字符串。

// 伪代码
const languageBundles = [
    "CPU %",
    "内存 (MB)",
    "PID",
    "名称",
    "结束进程",
    "强制结束进程",
    "复制",
    "全部复制",
    "调试"
];
function localize(index, args) {
    // ...
    return format(languageBundles[index], args);
}

那么 vs/nls 是如何得到 messages.vs/code/electron-browser/processExplorer/processExplorerMain 的呢？
先来快速复习一下 AMD 模块规范

模块通过 define 函数定义在闭包中，格式如下：
define(id?: String, dependencies?: String[], factory: Function|Object);
id 是模块的名字，它是可选的参数。
dependencies 指定了所要依赖的模块列表，它是一个数组，也是可选的参数，每个依赖的模块的输出将作为参数一次传入 factory 中。如果没有指定 dependencies，那么它的默认值是 ["require", "exports", "module"]。
define(function(require, exports, module) {}）
factory 是最后一个参数，它包裹了模块的具体实现，它是一个函数或者对象。如果是函数，那么它的返回值就是模块的输出接口或值。

再来看看编译后的代码如何定义模块的

// __M 是一个根据给定的参数从 __m 中获取模块列表的函数
define(__m[34/*vs/code/electron-browser/processExplorer/processExplorerMain*/], __M([1/*...*/,36/*vs/nls!vs/code/electron-browser/processExplorer/processExplorerMain*/]), function (require, exports, electron_1, strings_1, os_1, product_1, nls_1, browser, platform, contextmenu_1, dom_1, lifecycle_1, diagnostics_1) {

与我们熟知的 AMD 规范不同的是，这里的依赖列表中 vs.nls 后面多了 !vs/code/electron-browser/processExplorer/processExplorerMain，我们知道 vscode-loader 是 VS Code 自己实现的一个模块加载器，简单来看一下 define 的调用链

define

DefineFuc 包装前的 Define 方法

ModuleManager.defineModule 调用 ModuleManager.defineModule 定义模块

ModuleManager._normalizeDependencies 标准化依赖数组

ModuleManager._normalizeDependency 标准化依赖

let bangIndex = dependency.indexOf('!');

if (bangIndex >= 0) {
    let strPluginId = moduleIdResolver.resolveModule(dependency.substr(0, bangIndex));
    let pluginParam = moduleIdResolver.resolveModule(dependency.substr(bangIndex + 1));
    let dependencyId = this._moduleIdProvider.getModuleId(strPluginId + '!' + pluginParam);
    let pluginId = this._moduleIdProvider.getModuleId(strPluginId);
    return new PluginDependency(dependencyId, pluginId, pluginParam);
}

记得前文中说 vs/nls 实际是 vscode-loader 的一个插件吗，这里就是负责处理前文中 vs/nls!path/to/module 的地方，它将 ! 后面的字符作为 nlsPlugin 的参数，这里的 plugin 会作为一个特殊的依赖项(PluginDependency)，当完成标准化模块关系后，defineModule 里会调用 this._resolve 解析模块。我们直接看 _resolve 中对 PluginDependency 的处理。

if (dependency instanceof PluginDependency) {
    let plugin = this._modules2[dependency.pluginId];
    if (plugin && plugin.isComplete()) {
        // 加载插件依赖
        this._loadPluginDependency(plugin.exports, dependency);
        continue;
    }

    // Record dependency for when the plugin gets loaded
    let inversePluginDeps: PluginDependency[] = this._inversePluginDependencies2.get(dependency.pluginId);
    if (!inversePluginDeps) {
        inversePluginDeps = [];
        this._inversePluginDependencies2.set(dependency.pluginId, inversePluginDeps);
    }

    inversePluginDeps.push(dependency);

    this._loadModule(dependency.pluginId);
    continue;
}

重点来看 _loadPluginDependency 的实现，第一个参数即是一个 vs/nls 插件实例，并将模块依赖的资源加载行为委托给了插件。

private _loadPluginDependency(plugin: ILoaderPlugin, pluginDependency: PluginDependency): void {
    if (this._modules2[pluginDependency.id] || this._knownModules2[pluginDependency.id]) {
        // known module
        return;
    }
    this._knownModules2[pluginDependency.id] = true;

    // Delegate the loading of the resource to the plugin
    let load: IPluginLoadCallback = <any>((value: any) => {
        this.defineModule(this._moduleIdProvider.getStrModuleId(pluginDependency.id), [], value, null, null);
    });
    load.error = (err: any) => {
        this._config.onError(this._createLoadError(pluginDependency.id, err));
    };

    // 调用插件 load 方法加载依赖
    plugin.load(pluginDependency.pluginParam, this._createRequire(ModuleIdResolver.ROOT), load, this._config.getOptionsLiteral());
}

而在 vs/nls 插件的 load 方法中，首先会判断 name 是否存在，若不存在则加载一个默认的 localize 方法，若存在则读取插件的配置，调用由插件配置传入的 loadBundle(参数为对应的文件名及语言) 函数获取可用的语言包。
插件配置在 VS Code 启动时获取到语言信息完成语言包初始化后传入(开头的 nlsConfigurationPromise)。当语言包加载完成，再调用 _loadPluginDependency 中传入的 load 方法将其定义为一个模块依赖，同时传入一个 scopedLoadlize 作为 localize 的实现，这就是运行时真正的 localize 方法。

以上就是 VS Code 中多语言的实现方式，我们会发现整个方案非常依赖一个自定义的模块加载器以及代码编译时的行为，但作为可以独立开发并运行的插件进程不可能为了实现多语言强行用 vscode-loader 作为模块加载方案。那么插件又是如何正确的读取语言包显示对应文案的呢？

插件

插件中同样会读取 VSCODE_NLS_CONFIG 环境变量，不同的是插件中没有 vs/nls 模块，而是由一个 vscode-nls 替代，这是 VS Code 为插件独立开发的一个多语言模块，这里的实现相对简单一些。
首先在插件运行前就会自动执行 initializeSettings 函数，简而言之这里会读取 VSCODE_NLS_CONFIG 环境变量并将配置记录下来，在插件中使用 vscode-nls 模块前需要调用 nls.loadMessageBundle 方法，查看 loadMessageBundle 方法会发现它需要一个 file 文件名作为参数，实际上插件代码中并没有传入这个参数。
那么很显然插件编译时代码也经过了修改，具体来说编译时会将 nls.loadMessageBundle() 修改为 nls.loadMessageBundle(__filename)，再同样将 nls.localize(key, message) 修改为 nls.localize(index, args)，具体的逻辑可以查看 vscode-nls-dev 模块，这是 VS Code 为插件编译开发的一个模块，编译时也会分析插件 AST 生成 nls.metatdata.json 以及 nls.header.json 文件来记录插件默认的文案以及相关的多语言信息。

最后

VS Code 的多语言实现涉及到了依赖分析， AST 操作，模块加载器等许多技术细节，针对这部分工作原理我阅读了两三遍源代码，而且由于其实现的特殊性，如果不完整编译(执行 VS Code 打包编译任务，单纯使用 tsc 编译不会修改 nls 调用行为)并且阅读编译后代码的话可能会一直绕进坑里。希望这篇文章能给希望了解 VS Code 源码细节的同学带来帮助。

← 一种高性能的Tree组件实现方案揭秘手淘召唤术| 帮助千万级用户直达手淘的黑科技 →