杂谈：一些经验 – cococat cafe

1. 业务逻辑中的异步日志

在应用的初期版本，所有日志可能都是通过 tracing 输出到外部文件的，这是非常正常的情况；等到开始进行可观测能力建设后，就需要引入一些日志中间件，比如将一些重要日志输出到 SLS 。这些逻辑往往会穿插在我们业务函数中，于是就引出一个基本的原则：日志无论如何不能阻塞业务本身。（对于日志之外的非业务逻辑，其实也是一样的）

从反面例子开始说起：

假设我们最初有这样的一个业务函数：

async fn business() {
    // ...
    do_some_biz();
    let log = generate_important_log();
    tracing::info!("[biz] {log}");
    do_other_biz();
    // ...
}

这个版本没有任何问题。现在，我们需要接入某些高大上的日志中间件，这里为了简化，假设日志投递方式是一个 http 接口，那么最容易想到（也是最糟糕）的写法就是直接把 tracing 这行换掉：

async fn business() {
    // ...
    do_some_biz();
    let log = generate_important_log();
    let response = reqwest::Client::new()
        .post("https://naive-logger.com/api/push")
        .body(log.to_string())
        .send()
        .await?;

    do_other_biz();
    // ...
}

显而易见，我们的第二段业务逻辑 do_other_biz 需要等待日志投递完成。如果日志服务出现问题而导致请求超时，那么在这期间业务就会被阻塞住。

修复这个问题的思路，无非就是让业务与日志逻辑分隔开来，或者让所有日志相关的逻辑等到业务彻底完成以后再进行处理。实际写法里，使用分隔方案的场景是比较多的。

那么，似乎简单地把日志逻辑 spawn 出去即可？比如下面这样：

async fn business() {
    // ...
    do_some_biz();

    let log = generate_important_log();
    tokio::spawn(async {
        let response = reqwest::Client::new()
            .post("https://naive-logger.com/api/push")
            .body(log.to_string())
            .send()
            .await;
    });

    do_other_biz();
    // ...
}

这个改动表面上看起来能解决业务被阻塞的问题。但是实际上，这种写法也是不合理的：对于业务流量很大的场景，如果日志投递模块超时，则会出现大量的 tokio 异步任务累积，最终还是有可能打垮我们的服务。

对于类似的场景，最佳的解决方案是，一旦引入了这种服务，我们端侧的日志模块就必须改成基于有限长度队列的的生产者-消费者模式。现成的方案是基于 tokio::sync::mpsc::channel 进行开发，这个 channel 有着确定长度的消息缓冲区。与之对应的不限长度 channel 则是 tokio::sync::mpsc::unbounded_channel 。此外，当队列满导致日志无法正常投递时，可以在错误处理中增加落盘的逻辑。

2. 在业务系统中使用连接池时注意调整配置

reqwest 底层封装了 hyper，直接利用了 hyper 的连接池。在 reqwest::Client 的文档中有这么一段话：

The Client holds a connection pool internally, so it is advised that you create one and reuse it.
You do not have to wrap the Client in an [Rc] or Arc to reuse it, because it already uses an Arc internally.

可见，官方是推荐我们复用 client 的，这样可以充分利用其内建连接池特性的。在业务系统中调用上下游 api 时，通过长期持有一个 client，可以通过连接复用缓解对方压力，也能提高性能。

不过需要注意的是，需要确认服务方的 http keepalive 超时配置与一条 TCP 连接的请求复用次数限制。以 NGINX 举例，这两个配置分别是 keepalive_timeout 和 keepalive_request 。如果连接池的配置大于这二者，则可能出现预期外的调用失败。

3. 不要 select *

对于复杂的表，磁盘 IO 大，浪费带宽
可能无法充分利用索引。参考之前的 mysql 笔记的第 24 节
业务兼容不佳，未来新增字段以后，可能影响到 sql 结果解析。

4. 多个 insert 的部分用一行 sql 解决

虽然会涉及到一些比较恶心的 sql 字符串拼接，但是也有收益。这里直接参考 GPT：

Reduced Overhead
Each SQL statement incurs an overhead for parsing, planning, and network communication between the application and the database. By batching rows in a single INSERT, you reduce this overhead significantly by dealing with it only once.
Fewer Network Round-Trips:
A single batched insert minimizes the number of network round-trips between your application and the database server. Network latency can be a non-trivial component of the total operation time, especially in distributed environments.
Efficient Use of Transactions:
If you're using multiple separate INSERT statements, and each statement is executed in its own transaction, the overhead of starting and committing multiple transactions can be significant. A single INSERT for multiple rows can be executed within a single transaction, reducing the transaction management overhead.
Bulk Insert Optimizations:
Databases like MySQL can perform internal optimizations when handling a batch of inserts, such as minimizing index updates or reducing write operations until all data is inserted. This can lead to further performance benefits.
Locking and Concurrency:
Fewer transactions mean reduced locking and thus better concurrency and reduced potential for contention, especially in tables that are frequently written to.

5. 多思考自恢复问题

对上下游依赖实现了容灾以后，等到对端恢复以后，我们怎么恢复到最新的正确数据？

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

1. 业务逻辑中的异步日志

2. 在业务系统中使用连接池时注意调整配置

3. 不要 select *

4. 多个 insert 的部分用一行 sql 解决

5. 多思考自恢复问题

发送评论 编辑评论

推荐文章

发送评论编辑评论