Elastic Search中嵌套字段的术语聚合
我在elasticsearch(YML中的定义)中具有字段的下一个映射:
my_analyzer:
type: custom
tokenizer: keyword
filter: lowercase
products_filter:
type: "nested"
properties:
filter_name: {"type" : "string", analyzer: "my_analyzer"}
filter_value: {"type" : "string" , analyzer: "my_analyzer"}
每个文档都有很多过滤器,看起来像:
"products_filter": [
{
"filter_name": "Rahmengröße",
"filter_value": "33,5 cm"
}
,
{
"filter_name": "color",
"filter_value": "gelb"
}
,
{
"filter_name": "Rahmengröße",
"filter_value": "39,5 cm"
}
,
{
"filter_name": "Rahmengröße",
"filter_value": "45,5 cm"
}]
我试图获取唯一过滤器名称的列表以及每个过滤器的唯一过滤器值的列表。
我的意思是,我想获得结构是怎样的:Rahmengröße:
39.5厘米
45.5厘米
33.5厘米
颜色:
盖尔布
为了得到它,我尝试了几种聚合的变体,例如:
{
"aggs": {
"bla": {
"terms": {
"field": "products_filter.filter_name"
},
"aggs": {
"bla2": {
"terms": {
"field": "products_filter.filter_value"
}
}
}
}
}
}
这个请求是错误的。
它将为我返回唯一过滤器名称的列表,并且每个列表将包含所有filter_values的列表。
"bla": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 103,
"buckets": [
{
"key": "color",
"doc_count": 9,
"bla2": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 366,
"buckets": [
{
"key": "100",
"doc_count": 5
}
,
{
"key": "cm",
"doc_count": 5
}
,
{
"key": "unisex",
"doc_count": 5
}
,
{
"key": "11",
"doc_count": 4
}
,
{
"key": "160",
"doc_count": 4
}
,
{
"key": "22",
"doc_count": 4
}
,
{
"key": "a",
"doc_count": 4
}
,
{
"key": "alu",
"doc_count": 4
}
,
{
"key": "aluminium",
"doc_count": 4
}
,
{
"key": "aus",
"doc_count": 4
}
]
}
}
,
另外,我尝试使用反向嵌套聚合,但这对我没有帮助。
所以我认为我的尝试有逻辑上的错误吗?
-
如我所说。您的问题是您的文本被分析,elasticsearch总是在令牌级别聚合。因此,为了解决该问题,必须将字段值索引为单个标记。有两种选择:
- 不分析它们
- 使用关键字分析器+小写(不区分大小写的aggs)为它们编制索引
因此,将使用小写过滤器并删除重音符号(
ö => o
以及ß => ss
您的字段的其他字段,以创建自定义关键字分析器)来进行设置,以便将它们用于聚合(raw
和keyword
):PUT /test { "settings": { "analysis": { "analyzer": { "my_analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "asciifolding", "lowercase" ] } } } }, "mappings": { "data": { "properties": { "products_filter": { "type": "nested", "properties": { "filter_name": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } }, "filter_value": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } } } } } } } }
测试文件,您给了我们:
PUT /test/data/1 { "products_filter": [ { "filter_name": "Rahmengröße", "filter_value": "33,5 cm" }, { "filter_name": "color", "filter_value": "gelb" }, { "filter_name": "Rahmengröße", "filter_value": "39,5 cm" }, { "filter_name": "Rahmengröße", "filter_value": "45,5 cm" } ] }
这将是查询以使用
raw
字段进行汇总:GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "raw_names": { "terms": { "field": "products_filter.filter_name.raw", "size": 0 }, "aggs": { "raw_values": { "terms": { "field": "products_filter.filter_value.raw", "size": 0 } } } } } } } }
它确实带来了预期的结果(带有过滤器名称的存储桶和带有其值的子存储桶):
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "raw_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Rahmengröße", "doc_count": 3, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }
另外,您可以将field与关键字分析器(以及一些规范化)结合使用,以获得更通用且不区分大小写的结果:
GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "keyword_names": { "terms": { "field": "products_filter.filter_name.keyword", "size": 0 }, "aggs": { "keyword_values": { "terms": { "field": "products_filter.filter_value.keyword", "size": 0 } } } } } } } }
结果就是:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "keyword_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "rahmengrosse", "doc_count": 3, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }
-
Elastic Search 6嵌套查询聚合
2021-02-01 关注 0 浏览129 1答案
-
Elastic Search嵌套对象查询
2021-02-01 关注 0 浏览122 1答案
-
Elastic search 带空格的术语
2021-02-01 关注 0 浏览143 1答案
-
在Elastic Search中模拟字段折叠/按字段分组
2021-02-01 关注 0 浏览124 1答案
-
Kibana Logstash Elastic search| 未索引字段
2021-02-01 关注 0 浏览113 1答案
-
Elastic Search按嵌套文档的数量过滤
2021-02-01 关注 0 浏览246 1答案
-
将Elastic Search字段转换为数组
2021-02-01 关注 0 浏览146 1答案
-
在Elastic Search中索引以逗号分隔的值字段
2021-02-01 关注 0 浏览272 1答案
-
Elastic Search 5.x嵌套多个查询C#
2021-02-01 关注 0 浏览98 1答案
-
Spring Elastic Search自定义字段名称
2021-02-01 关注 0 浏览165 1答案