Python

如何在整个子查询上使用group_concat？

发布于 2021-01-29 15:01:53

…无需进行不必要的比较

我想获取一系列行的md5哈希。由于带宽限制，我希望它发生在服务器端。

这有效：

create table some_table (id int auto_increment,
                         col1 varchar(1),
                         col2 int,
                         primary key (id));

insert into some_table (col1, col2)
                values ('a', 1),
                       ('b', 11),
                       ('c', 12),
                       ('d', 25),
                       ('e', 50);

select group_concat(id,col1,col2) from
    (select * from some_table
     where id >= 2 and id < 5
     order by id desc) as some_table
group by 1 = 1;

输出：

+----------------------------+
| group_concat(id,col1,col2) |
+----------------------------+
| 2b11,3c12,4d25             |
+----------------------------+

并带有哈希：

select md5(group_concat(id,col1,col2)) from
    (select * from some_table
     where id >= 2 and id < 5
     order by id desc) as some_table
group by 1 = 1;

输出：

+----------------------------------+
| md5(group_concat(id,col1,col2))  |
+----------------------------------+
| 32c1f1dd34d3ebd33ca7d95f3411888e |
+----------------------------------+

但是我觉得应该有更好的方法。

特别是，我想避免将1与1百万次进行比较，这是我发现将行范围划分为一组所必需的，这是我需要使用的group_concat，而md5在多行中使用是必须的。

有没有一种方法可以group_concat在行范围内使用（或类似方法）而无需进行不必要的比较？

编辑

我想对多行进行哈希处理，以便可以比较不同服务器上产生的哈希值。如果它们不同，则可以得出结论，子查询返回的行存在差异。

关注者

被浏览

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。

解决方案就是完全省略group by 1 = 1。我假设这group_concat将要求我为其提供一个组，但是可以将其直接用于子查询，如下所示：

select group_concat(id,col1,col2) from
    (select * from some_table
     where id >= 2 and id < 5
     order by id desc) as some_table;

请注意，需要将null值强制转换为concat-friendly，例如：

insert into some_table (col1, col2)
                values ('a', 1),
                       ('b', 11),
                       ('c', NULL),
                       ('d', 25),
                       ('e', 50);

select group_concat(id, col1, col2) from
    (select id, col1, ifnull(col2, 'NULL') as col2
     from some_table
     where id >= 2 and id < 5
     order by id desc) as some_table;

输出：

+------------------------------+
| group_concat(id, col1, col2) |
+------------------------------+
| 2b11,3cNULL,4d25             |
+------------------------------+

另一个警告：mysql的最大长度group_concat由变量：定义group_concat_max_len。为了散列 n个
表行的串联，我需要：

哈希行，使其以32位表示，而不管其具有多少列
确保group_concat_max_len > (n * 33)（多余的字节用于添加逗号）
散group_concat列哈希行。

最终，我最终使用了客户端语言来检查每列的名称，编号和可空性，然后构建如下查询：

select md5(group_concat(row_fingerprint)) from
    (select concat(id, col1, ifnull(col2, 'null')) as row_fingerprint
     from some_table
     where id >= 2 and id < 5
     order by id desc) as foo;

有关更多详细信息，您可以在此处浏览我的代码（请参见函数：find_diff_intervals）。

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看