PHP-yield迭代器之文件下载

在产品开发的过程中，早期阶段，可能都会遇到一个无法避免的问题：导数据！少量的数据还比较好处理，但是如果超过一定规模的数据，怎么保证不影响服务稳定的情况下，导出数据呢？这次我们使用PHP迭代器，yield

关于yield

先看一下基础的使用

function yield_range($start, $end) {
    while ($start <= $end) {
        yield $start;
        $start++;
    }
}

$generator = yield_range(1, 100);
var_dump($generator);
while ($generator->valid()) {
    echo $generator->current().PHP_EOL;
    $generator->next();
}

可以看出输出$generator返回一个对象，为Generator。通过手册，可以发现，它封装的有以下方法

我们使用while和foreach针对迭代器进行内容输出

function yield_range2($start, $end) {
    while ($start <= $end) {
        yield $start;
        $start++;
    }
}

$generator2 = yield_range2(1, 10);
while ($generator2->valid()) {
    echo 'while->current:' . $generator2->current() . PHP_EOL;
    $generator2->next();
}

$generator2 = yield_range2(1, 10);
foreach ($generator2 as $item) {
    echo 'foreach->current:' . $item . PHP_EOL;
}

因此可以使用类似循环一样的方式去处理，初次接触时可能不太容易理解，多去操作写几个例子就可以体会得到了。总结一点就是yield后面是什么，迭代器遍历的时候，拿到的就是什么。

使用迭代器下载大数据量

利用前面描述的迭代器，我们可以很简单的利用这个特性，去处理下载大批量数据的问题。一个简单的思想即是将大批量数据分开，小批量的查询，写入。这样保证服务的可用性、稳定性

简单示例

<?php

require_once dirname(__DIR__) . '/dbo/mysql.php';

class FetchData
{

    /**
     * @var int 查询的最小id
     */
    private $minId;

    /**
     * @var int 查询的最大id
     */
    private $maxId;

    /**
     * @var int 批量查询的步长
     */
    private $step = 3;

    /**
     * 获取下一次查询的起始id
     * @param int $currentId
     * @return int
     */
    private function getNextId(int $currentId): int
    {
        return $currentId + $this->step;
    }

    /**
     * 根据条件设置起始id
     * @param string $map
     * @param string $table
     */
    private function setMinId(string $map, string $table): void
    {
        $pdo = mysql::getConnect('db');
        $sql = "select min(id) id from {$table} where {$map}";
        $res = $pdo->query($sql);
        $this->minId = $res[0]['id'];
    }

    /**
     * 根据条件设置结束id
     * @param string $map
     * @param string $table
     */
    private function setMaxId(string $map, string $table): void
    {
        $pdo = mysql::getConnect('db');
        $sql = "select max(id) id from {$table} where {$map}";
        $res = $pdo->query($sql);
        $this->maxId = $res[0]['id'];
    }

    /**
     * 查询数据，返回迭代器
     * @param string $field
     * @param string $map
     * @param string $table
     * @return Generator|null
     */
    public function getList(string $field, string $map, string $table): ?Generator
    {
        $this->setMinId($map, $table);
        $this->setMaxId($map, $table);
        $currentId = $this->minId;
        $pdo = mysql::getConnect('db');
        while ($currentId <= $this->maxId) {
            $nextId = $this->getNextId($currentId);
            $sql = "select {$field} from {$table} where {$map} and id >= {$currentId} and id < {$nextId}";
            $list = $pdo->query($sql);
            if (empty($list)) {
                //如果当前的查询已经到了末尾，则终止
                if ($nextId > $this->maxId) {
                    break;
                }
            } else {
                yield $list;
            }
            $currentId = $nextId;
        }
        return null;
    }
}


class Download
{
    /**
     * 数据下载
     */
    public function downloadCsv(): void
    {
        $dir = '/mnt/hgfs/workspace';
        $filepath = $dir . '/data.xlsx';
        $fp = fopen($filepath, 'wb+');
        fwrite($fp, "\xEF\xBB\xBF");
        $list = (new FetchData())->getList('id,a,b', 'add_time >= 1577933547 and add_time <= 1578408139', 'table');
        foreach ($list as $item) {
            foreach ($item as $row) {
                fputcsv($fp, [
                    $row['id'],
                    $row['a'],
                    $row['b'],
                ]);
            }
        }
        fclose($fp);
    }
}

(new Download())->downloadCsv();

以上将数据检索的这一流程，使用迭代器的方式去查询返回，这样每次只处理一部分数据，并写入至文件。

Leave a Reply 取消回复