淘系前端团队

新版卖家中心 Bigpipe 实践（二）

作者：发布于：

自从上次通过新版卖家中心 Bigpipe 实践（一）阐述了 Bigpipe 实现思路和原理之后，一转眼春天就来了。而整个实践过程，从开始冬天迎着冷风前行，到现在逐渐回暖。其中感受和收获良多，和大家分享下。代码偏多，请自带编译器。

核心问题

一切技术的产生或者使用都是为了解决问题，所以开始前，看下要解决的问题：

同步加载首屏模块，服务端各个模块并行生成内容，客户端渲染内容依赖于最后一个内容的生成时间。这里的痛点是同步。因为要多模块同步，所以难免浏览器要等待，浏览器等待也就是用户等待。

于是我们采用了滚动异步加载模块，页面框架优先直出，几朵菊花旋转点缀，然后首屏模块通过异步请求逐个展现出来。虽然拿到什么就能在客户端渲染显示，但还是有延迟感。这里的痛点是请求，每个模块都需要多一个请求，也需要时间。

Facebook 的工程师们会不会是这样想的：一次请求，各个首屏模块服务端并行处理生成内容，生成的内容能直接传输给客户端渲染，用户能马上看到内容，这样好猴赛雷~

其实 Bigpipe 的思路是从微处理器的流水线中受到启发

技术突破口

卖家中心主体也是功能模块化，和 Facebook 遇到的问题是一致的。核心的问题换个说法： 通过一个请求链接，服务端能否将动态内容分块传输到客户端实时渲染展示，直到内容传输结束，请求结束。

概念

技术点：HTTP 协议的分块传输（在 HTTP 1.1 提供）概念入口

如果一个 HTTP 消息（请求消息或应答消息）的 Transfer-Encoding 消息头的值为 chunked ，那么，消息体由数量未定的块组成，并以最后一个大小为 0 的块为结束。

这种机制使得网页内容分成多个内容块，服务器和浏览器建立管道并管理他们在不同阶段的运行。

实现

如何实现数据分块传输，各个语言的方式并不一样。

PHP 的方式

<html>
<head>
    <title>php chunked</title>
</head>
<body>

    <?php sleep(1); ?>
    <div id="moduleA"><?php echo 'moduleA' ?></div>
    <?php ob_flush(); flush(); ?>
    
    <?php sleep(3); ?>
    <div id="moduleB"><?php echo 'moduleB' ?></div>
    <?php ob_flush(); flush(); ?>
    
    <?php sleep(2); ?>
    <div id="moduleC"><?php echo 'moduleC' ?></div>
    <?php ob_flush(); flush(); ?>

</body>
</html>

PHP 利用 ob_flush 和 flush 把页面分块刷新缓存到浏览器，查看 network ，页面的 Transfer-Encoding=chunked ，实现内容的分块渲染。

PHP 不支持线程，所以服务器无法利用多线程去并行处理多个模块的内容。

PHP 也有并发执行的方案，这里不做扩展，有兴趣地可以去深入研究下。

Java 的方式

Java 也有类似于 flush 的函数实现简单页面的分块传输。

Java 是多线程的，方便并行地处理各个模块的内容。

flush 的思考

Yahoo 34 条性能优化 Rules 里面提到 flush 时机是 head 之后，可以让浏览器先行下载 head 中引入的 CSS/js。

我们会把内容分成一块一块 flush 到浏览器端，flush 的内容优先级应该是用户关心的。比如 Yahoo 之前优先 flush 的就是搜索框，因为这个是核心功能。

flush 的内容大小需要进行有效地拆分，大内容可以拆成小内容。

Node.js 实现

通过对比 PHP 和 Java 在实现 Bigpipe 上的优势和劣势，很容易在 Node.js 上找到幸福感。

Node.js 的异步特性可以很容易地处理并行的问题。

View 层全面控制，对于需要服务端处理数据和客户端渲染有天然的优势。

Node.js 中的 HTTP 接口的设计支持许多 HTTP 协议中原本用起来很困难的特性。

回到 HelloWorld

var http = require('http');

http.createServer(function (request, response){
  response.writeHead(200, {'Content-Type': 'text/html'});
  response.write('hello');
  response.write(' world ');
  response.write('~ ');
  response.end();
}).listen(8080, "127.0.0.1");

HTTP 头 Transfer-Encoding=chunked ，我的天啊，太神奇了！

如果只是 response.write 数据，没有指示 response.end ，那么这个响应就没有结束，浏览器会保持这个请求。在没有调用 response.end 之前，我们完全可以通过 response.write 来 flush 内容。

把 Bigpipe Node.js 实现是从 HelloWorld 开始，心情有点小激动。

完整点

layout.html

<!DOCTYPE html>
<html>
<head>
	<!-- css and js tags -->
    <link rel="stylesheet" href="index.css" />
    <script>
    function renderFlushCon(selector, html) {
        document.querySelector(selector).innerHTML = html;
    }
    </script>
</head>
<body>
    <div id="A"></div>
    <div id="B"></div>
    <div id="C"></div>

head 里面放我们要加载的 assets

输出页面框架，A/B/C 模块的占位

var http = require('http');
var fs = require('fs');

http.createServer(function(request, response) {
  response.writeHead(200, { 'Content-Type': 'text/html' });

  // flush layout and assets
  var layoutHtml = fs.readFileSync(__dirname + "/layout.html").toString();
  response.write(layoutHtml);
  
  // fetch data and render
  response.write('<script>renderFlushCon("#A","moduleA");</script>');
  response.write('<script>renderFlushCon("#C","moduleC");</script>');
  response.write('<script>renderFlushCon("#B","moduleB");</script>');
  
  // close body and html tags
  response.write('</body></html>');
  // finish the response
  response.end();
}).listen(8080, "127.0.0.1");

页面输出：

moduleA
moduleB
moduleC

flush layout 的内容包含浏览器渲染的函数

然后进入核心的取数据、模板拼装，将可执行的内容 flush 到浏览器

浏览器进行渲染（此处还未引入并行处理）

关闭 body 和 HTML 标签

结束响应完成一个请求

express 实现

var express = require('express');
var app = express();
var fs = require('fs');

app.get('/', function (req, res) {
  // flush layout and assets
  var layoutHtml = fs.readFileSync(__dirname + "/layout.html").toString();
  res.write(layoutHtml);
  
  // fetch data and render
  res.write('<script>renderFlushCon("#A","moduleA");</script>');
  res.write('<script>renderFlushCon("#C","moduleC");</script>');
  res.write('<script>renderFlushCon("#B","moduleB");</script>');
  
  // close body and html tags
  res.write('</body></html>');
  // finish the response
  res.end();
});

app.listen(3000);

页面输出：

moduleA
moduleB
moduleC

express 建立在 Node.js 内置的 HTTP 模块上，实现的方式差不多

koa 实现

var koa = require('koa');
var app = koa();

app.use(function *() {
    this.body = 'Hello world';
});

app.listen(3000);

Koa 不支持直接调用底层 res 进行响应处理。 res.write()/res.end() 就是个雷区，有幸踩过。

koa 中，this 这个上下文对 Node.js 的 request 和 response 对象的封装。this.body 是 response 对象的一个属性。

感觉 koa 的世界就剩下了 generator 和 this.body ，怎么办？继续看文档~

this.body 可以设置为字符串， buffer 、stream 、对象、或者 null 也行。

stream stream stream 说三遍可以变得很重要。

流的意义

关于流，推荐看 @愈之的通通连起来 -- 无处不在的流，感触良多，对流有了新的认识，于是接下来连连看。

var koa = require('koa');
var View = require('./view');
var app = module.exports = koa();

app.use(function* () {
  this.type = 'html';
  this.body = new View(this);
});

app.listen(3000);

view.js

var Readable = require('stream').Readable;
var util = require('util');
var co = require('co');
var fs = require('fs');

module.exports = View

util.inherits(View, Readable);

function View(context) {
  Readable.call(this, {});

  // render the view on a different loop
  co.call(this, this.render).catch(context.onerror);
}

View.prototype._read = function () {};

View.prototype.render = function* () {
  // flush layout and assets
  var layoutHtml = fs.readFileSync(__dirname + "/layout.html").toString();
  this.push(layoutHtml);
  
  // fetch data and render
  this.push('<script>renderFlushCon("#A","moduleA");</script>');
  this.push('<script>renderFlushCon("#C","moduleC");</script>');
  this.push('<script>renderFlushCon("#B","moduleB");</script>');
  
  // close body and html tags
  this.push('</body></html>');
  // end the stream
  this.push(null);
};

页面输出：

moduleA
moduleB
moduleC

Transfer-Encoding:chunked

服务端和浏览器端建立管道，通过 this.push 将内容从服务端传输到浏览器端

并行的实现

目前我们已经完成了 koa 和 express 分块传输的实现，我们知道要输出的模块 A 、模块 B 、模块 C 需要并行在服务端生成内容。
在这个时候来回顾下传统的网页渲染方式，A / B / C 模块同步渲染：

采用分块传输的模式，A / B / C 服务端顺序执行，A / B / C 分块传输到浏览器渲染：

时间明显少了，然后把服务端的顺序执行换成并行执行的话：

通过此图，并行的意义是显而易见的。为了寻找并行执行的方案，就不得不追溯异步编程的历史。（读史可以明智，可以知道当下有多不容易）

callback 的方式

首先过多 callback 嵌套实现异步编程是地狱

第二选择绕过地狱，选择成熟的模块来取代

async 的方式

async 算是异步编码流程控制中的元老。

parallel(tasks, [callback]) 并行执行多个函数，每个函数都是立即执行，不需要等待其它函数先执行。传给最终 callback 的数组中的数据按照 tasks 中声明的顺序，而不是执行完成的顺序。

var Readable = require('stream').Readable;
var inherits = require('util').inherits;
var co = require('co');
var fs = require('fs');
var async = require('async');


inherits(View, Readable);

function View(context) {
  Readable.call(this, {});

  // render the view on a different loop
  co.call(this, this.render).catch(context.onerror);
}

View.prototype._read = function () {};

View.prototype.render = function* () {
  // flush layout and assets
  var layoutHtml = fs.readFileSync(__dirname + "/layout.html").toString();
  this.push(layoutHtml);

  var context = this;

  async.parallel([
    function(cb) {
      setTimeout(function(){
        context.push('<script>renderFlushCon("#A","moduleA");</script>');
        cb();
      }, 1000);
    },
    function(cb) {
      context.push('<script>renderFlushCon("#C","moduleC");</script>');
      cb();
    },
    function(cb) {
      setTimeout(function(){
        context.push('<script>renderFlushCon("#B","moduleB");</script>');
        cb();
      }, 2000);
    }
  ], function (err, results) {
    // close body and html tags
    context.push('</body></html>');
    // end the stream
    context.push(null);
  });
  
};

module.exports = View;

页面输出：

moduleC
moduleA
moduleB

模块显示的顺序是 C>A>B ，这个结果也说明了 Node.js IO 不阻塞

优先 flush layout 的内容

利用 async.parallel 并行处理 A 、B 、C ，通过 cb() 回调来表示该任务执行完成

任务执行完成后执行结束回调，此时关闭 body/html 标签并结束 stream

每个 task 函数执行中，如果有出错，会直接最后的 callback。此时会中断，其他未执行完的任务也会停止，所以这个并行执行的方法处理异常的情况需要比较谨慎。

另外 async 里面有个 each 的方法也可以实现异步编程的并行执行：

each(arr, iterator(item, callback), callback(err))

稍微改造下：

var options = [
  {id:"A",html:"moduleA",delay:1000},
  {id:"B",html:"moduleB",delay:0},
  {id:"C",html:"moduleC",delay:2000}
];


async.forEach(options, function(item, callback) { 
  setTimeout(function(){
    context.push('<script>renderFlushCon("#'+item.id+'","'+item.html+'");</script>');
    callback();
  }, item.delay);
  
}, function(err) { 
  // close body and html tags
  context.push('</body></html>');
  // end the stream
  context.push(null);
});

结果和 parallel 的方式是一致的，不同的是这种方式关注执行过程，而 parallel 更多的时候关注任务数据

我们会发现在使用 async 的时候，已经引入了 co ，co 也是异步编程的利器，看能否找到更简便的方法。

co

co 作为一个异步流程简化工具，能否利用强大的生成器特性实现我们的并行执行的目标。其实我们要的场景很简单：

多个任务函数并行执行，完成最后一个任务的时候可以进行通知执行后面的任务。

var Readable = require('stream').Readable;
var inherits = require('util').inherits;
var co = require('co');
var fs = require('fs');
// var async = require('async');

inherits(View, Readable);

function View(context) {
  Readable.call(this, {});

  // render the view on a different loop
  co.call(this, this.render).catch(context.onerror);
}

View.prototype._read = function () {};

View.prototype.render = function* () {
  // flush layout and assets
  var layoutHtml = fs.readFileSync(__dirname + "/layout.html").toString();
  this.push(layoutHtml);

  var context = this;
  var options = [
    {id:"A",html:"moduleA",delay:100},
    {id:"B",html:"moduleB",delay:0},
    {id:"C",html:"moduleC",delay:2000}
  ];

  var taskNum = options.length;
  var exec = options.map(function(item){opt(item,function(){
    taskNum --;
    if(taskNum === 0) {
      done();
    } 
  })});

  function opt(item,callback) {
    setTimeout(function(){
      context.push('<script>renderFlushCon("#'+item.id+'","'+item.html+'");</script>');
      callback();
    }, item.delay);
  }

  function done() {
    context.push('</body></html>');
      // end the stream
    context.push(null);
  }

  co(function* () {
     yield exec;
  });  
};

module.exports = View;

yield array 并行执行数组内的任务。

为了不使用 promise 在数量可预知的情况，加了个计数器来判断是否已经结束，纯 co 实现还有更好的方式？

到这个时候，才发现生成器的特性并不能应运自如，需要补一补。

co 结合 promise

这个方法由@大果同学赞助提供，写起来优雅很多。

var options = [
  {id:"A",html:"moduleAA",delay:100},
  {id:"B",html:"moduleBB",delay:0},
  {id:"C",html:"moduleCC",delay:2000}
];

var exec = options.map(function(item){ return opt(item); });

function opt(item) {
  return new Promise(function (resolve, reject) {
  setTimeout(function(){
      context.push('<script>renderFlushCon("#'+item.id+'","'+item.html+'");</script>');
      resolve(item);
    }, item.delay);
  });
}

function done() {
  context.push('</body></html>');
    // end the stream
  context.push(null);
}

co(function* () {
   yield exec;
}).then(function(){
  done();
});

ES 7 async/wait

如果成为标准并开始引入，相信代码会更精简、可读性会更高，而且实现的思路会更清晰。

async function flush(Something) {  
	await Promise.all[moduleA.flush(), moduleB.flush(),moduleC.flush()]
	context.push('</body></html>');
      // end the stream
    context.push(null);
}

此段代码未曾跑过验证，思路和代码摆在这里，ES 7 跑起来 ^_。

Midway

写到这里太阳已经下山了，如果在这里来个“预知后事如何，请听下回分解”，那么前面的内容就变成一本没有主角的小说。

Midway 是好东西，是前后端分离的产物。分离不代表不往来，而是更紧密和流畅。因为职责清晰，前后端有时候可以达到“你懂的，懂！”，然后一个需求就可以明确了。用 Node.js 代替 Webx MVC 中的 View 层，给前端实施 Bigpipe 带来无限的方便。

Midway 封装了 koa 的功能，屏蔽了一些复杂的元素，只暴露出最简单的 MVC 部分给前端使用，降低了很大一部分配置的成本。

一些信息

Midway 其实支持 express 框架和 koa 框架，目前主流应该都是 koa，Midway 5.1 之后应该不会兼容双框架。

Midway 可以更好地支持 generators 特性

midway-render this.render（xtpl,data）内容直接通过 this.body 输出到页面。

function renderView(basePath, viewName, data) {
  var me = this;
  var filepath = path.join(basePath, viewName);
  data = utils.assign({}, me.state, data);
  return new Promise(function(resolve, reject) {
    function callback(err, ret) {
      if (err) {
        return reject(err);
      }
      // 拼装后直接赋值this.body
      me.body = ret;
      resolve(ret);
    }
    render(filepath, data, callback);
  });
}

MVC

Midway 的专注点是做前后端分离，Model 层其实是对后端的 Model 做一层代理，数据依赖后端提供。

View 层模板使用 xtpl 模板，前后端的模板统一。

Controller 把路由和视图完整的结合在了一起，通常在 Controller 中实现 this.render。

Bigpipe 的位置

了解 Midway 这些信息，其实是为了弄清楚 Bigpipe 在 Midway 里面应该在哪里接入会比较合适：

Bigpipe 方案需要实现对内容的分块传输，所以也是在 Controller 中使用。

拼装模板需要 midway-xtpl 实现拼装好字符串，然后通过 Bigpipe 分块输出。

Bigpipe 可以实现对各个模块进行取数据和拼装模块内容的功能。

建议在 Controller 中作为 Bigpipe 模块引入使用，取代原有 this.render 的方式进行内容分块输出

场景

什么样的场景比较适合 Bigpipe，结合我们现有的东西和开发模式。

类似于卖家中心，模块多，页面长，首屏又是用户核心内容。

每个模块的功能相对独立，模板和数据都相对独立。

非首屏模块还是建议用滚动加载，减少首屏传输量。

主框架输出 assets 和 bigpipe 需要的脚本，主要的是需要为模块预先占位。

首屏模块是可以固定或者通过计算确认。

模块除了分块输出，最好也支持异步加载渲染的方式。

封装

最后卖家中心的使用和 Bigpipe 的封装，我们围绕着前面核心实现的分块传输和并行执行，目前的封装是这样的：

由于 Midway this.render 除了拼装模板会直接将内容赋值到 this.body，这种时候回直接中断请求，无法实现我们分块传输的目标。所以做了一个小扩展：

midway-render 引擎里面添加只拼装模板不输出的方法 this.Html

 // just output html no render;
  app.context.Html = utils.partial(engine.renderViewText, config.path);

renderViewText

function renderViewText(basePath, viewName, data) {
  var me = this;
  var filepath = path.join(basePath, viewName);
  data = utils.assign({}, me.state, data);

  return new Promise(function(resolve, reject) {
    render(filepath, data, function(err, ret){
      if (err) {
        return reject(err);
      }
      //此次 去掉了 me.body=ret
      resolve(ret);
    });
  });
}

midway-render/midway-xtpl 应该有扩展，但是没找到怎么使用，所以选择这样的方式。

View.js 模块

'use strict';
var util = require('util');
var async = require('async');
var Readable = require('stream').Readable;

var midway = require('midway');
var DataProxy = midway.getPlugin('dataproxy');

// 默认主体框架
var defaultLayout = '<!DOCTYPE html><html><head></head><body></body>';

exports.createView = function() {
  function noop() {};

  util.inherits(View, Readable);

  function View(ctx, options) {
    Readable.call(this);

    ctx.type = 'text/html; charset=utf-8';
    ctx.body = this;
    ctx.options = options;
    this.context = ctx;

    this.layout = options.layout || defaultLayout;
    this.pagelets = options.pagelets || [];
    this.mod = options.mod || 'bigpipe';
    this.endCB = options.endCB || noop;
  }

  /**
   *
   * @type {noop}
   * @private
   */
  View.prototype._read = noop;


  /**
   * flush 内容
   */
  View.prototype.flush = function* () {
    // flush layout
    yield this.flushLayout();

    // flush pagelets
    yield this.flushPagelets();
  };

  /**
   * flush主框架内容
   */
  View.prototype.flushLayout = function* () {
    this.push(this.layout);
  }

  /**
   * flushpagelets的内容
   */
  View.prototype.flushPagelets = function* () {
    var self = this;
    var pagelets = this.pagelets;

    // 并行执行
    async.each(pagelets, function(pagelet, callback) {
      self.flushSinglePagelet(pagelet, callback);
    }, function(err) {
      self.flushEnd();
    });
  }


  /**
   * flush 单个pagelet
   * @param pagelet
   * @param callback
   */
  View.prototype.flushSinglePagelet = function(pagelet, callback) {
    var self = this,
      context = this.context;

    this.getDataByDataProxy(pagelet,function(data){
      var data = pagelet.formateData(data, pagelet) || data;

      context.Html(pagelet.tpl, data).then(function(html) {
        var selector = '#' + pagelet.id;
        var js = pagelet.js;

        self.arrive(selector,html,js);

        callback();
      });
    });
  }

  /**
   * 获取后端数据
   * @param pagelet
   * @param callback
   */
  View.prototype.getDataByDataProxy = function(pagelet, callback) {
    var context = this.context;

    if (pagelet.proxy) {
      var proxy = DataProxy.create({
        getData: pagelet.proxy
      });

      proxy.getData()
        .withHeaders(context.request.headers)
        .done(function(data) {
          callback && callback(data);
        })
        .fail(function(err) {
          console.error(err);
        });
    }else {
      callback&&callback({});
    }
  }

  /**
   * 关闭html结束stream
   */
  View.prototype.flushEnd = function() {
    this.push('</html>');
    this.push(null);
  }



  // Replace the contents of `selector` with `html`.
  // Optionally execute the `js`.
  View.prototype.arrive = function (selector, html, js) {
      this.push(wrapScript(
          'BigPipe(' +
              JSON.stringify(selector) + ', ' +
              JSON.stringify(html) +
              (js ? ', ' + JSON.stringify(js) : '') + ')'
      ))
  }



  function wrapScript(js) {
    var id = 'id_' + Math.random().toString(36).slice(2)

    return '<script id="' + id + '">'
      + js
      + ';remove(\'#' + id + '\');</script>'
  }

  return View;
}

context.html 拼装各个 pagelet 的内容

Controller 调用

var me = this;
var layoutHtml = yield this.Html('p/seller_admin_b/index', data);

yield new View(me, {
  layout: layoutHtml, // 拼装好layout模板
  pagelets: pageletsConfig,
  mod: 'bigpie'  // 预留模式选择
}).flush();

layoutHtml 拼装好主框架模板

每个 pagelets 的配置

{
	id: 'seller_info',//该pagelet的唯一id
    proxy: 'Seller.Module.Data.seller_info', // 接口配置
    tpl: 'sellerInfo.xtpl', //需要的模板
    js: '' //需要执行的js
}

proxy 和 tpl 获取数据和拼装模板需要并行执行

js 通常进行模块的初始化

改进

思路和代码实现都基于现有的场景和技术背景，目前只有实现的思路和方案尝试，还没形成统一的解决方案，需要更多的场景来支持。目前有些点还可以改进的：

代码可以采用 ES6/ES7 新特性进行改造会更优雅，时刻结合 Midway 的升级进行改进。

分块传输机制存在一些低版本浏览器不兼容的情况，最好实现异步加载模块的方案，分双路由，根据用户设备切换路由。

对于每个模块和内容进行异常处理，设置一个请求的时间限制，达到限制时间，关闭链接，不要让页面挂起。此时把本来需要进行分块传输的模块通过异步的方式引入。

并行的实现方案目前采用 async.each，需要从性能上进行各方案的对比

参考链接

bigpipe

分块传输编码

BigPipe 学习研究

Using Streaming Chunked HTML to Get Node.js to Deliver More Data

异步 JavaScript 的发展历程

async