说明

apache calcite 在 v1.36.0 中,默认 ES 是不支持 like 的,同时 contains 必须要求是 date 类型。

我们有时候希望实现一个类似于 like 或者说 regex 的实现,这种常见需求就会变得比较麻烦。

这里演示一下一种可行的实现方式,作为参考。

需要修改源码。

ES 数据准备

数据结构

  [plaintext]
1
2
3
$ curl -X GET "localhost:9200/booksmapping" {"booksmapping":{"aliases":{},"mappings":{"properties":{"author":{"type":"keyword"},"description":{"type":"text"},"id":{"type":"keyword"},"price":{"type":"float"},"publish_time":{"type":"date","format":"yyyy-MM-dd"},"title":{"type":"text","analyzer":"standard"}}},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"booksmapping","creation_date":"1706773773494","number_of_replicas":"1","uuid":"gxDMFnVxR4CrCevzCTEuXQ","version":{"created":"7150099"}}}}}d

数据本身

  [plaintext]
1
2
3
$ curl -X GET "localhost:9200/booksmapping/_search" {"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"booksmapping","_type":"_doc","_id":"1","_score":1.0,"_source":{ "id" : "1", "title" : "Java编程思想", "author" : "Bruce Eckel", "price": 70.20, "publish_time" : "2007-10-01", "description" : "Java学习必读经典,殿堂级著作,赢得了全球程序员的广泛赞誉" }},{"_index":"booksmapping","_type":"_doc","_id":"2","_score":1.0,"_source":{ "id" : "2", "title" : "Java程序性能优化", "author" : "葛一鸣", "price": 46.50, "publish_time" : "2012-08-01", "description" : "让你的Java程序更快,更稳定。深入剖析软件层面,代码层面,JVM虚拟机层面的优化方法" }},{"_index":"booksmapping","_type":"_doc","_id":"3","_score":1.0,"_source":{ "id" : "3", "title" : "Python科学计算", "author" : "张惹愚", "price": 81.40, "publish_time" : "2016-05-01", "description" : "零基础学Python,光盘中作者独家整合开发winPython环境, 涵盖了Python各个扩展库" }},{"_index":"booksmapping","_type":"_doc","_id":"4","_score":1.0,"_source":{ "id" : "4", "title" : "Python基础教程", "author" : "Helant", "price": 54.50, "publish_time" : "2014-03-01", "description" : "经典Python入 门教程,层次鲜明,结构严谨,内容翔实" }},{"_index":"booksmapping","_type":"_doc","_id":"5","_score":1.0,"_source":{ "id" : "5", "title" : "JavaScript高级程序设计", "author" : "Nicholas C. Zakas", "price": 66.40, "publish_time" : "2012-10-01", "description" : "JavaScript经典名著" }}]}}

测试验证

maven 依赖

  [xml]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.example</groupId> <artifactId>calcite-learn</artifactId> <version>1.0-SNAPSHOT</version> </parent> <artifactId>calcite-learn-es-define</artifactId> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <calcite.version>1.36.0</calcite.version> </properties> <dependencies> <dependency> <groupId>org.apache.calcite</groupId> <artifactId>calcite-core</artifactId> <version>${calcite.version}</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.calcite/calcite-elasticsearch --> <!-- <dependency>--> <!-- <groupId>org.apache.calcite</groupId>--> <!-- <artifactId>calcite-elasticsearch</artifactId>--> <!-- <version>${calcite.version}</version>--> <!-- </dependency>--> <!-- https://mvnrepository.com/artifact/org.apache.calcite/calcite-avatica --> <dependency> <groupId>org.apache.calcite</groupId> <artifactId>calcite-avatica</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> <version>3.17.3</version> <!-- 使用最新的 protobuf 版本 --> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.15.0</version> <!-- 请根据你的 Elasticsearch 版本选择相应版本 --> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.78</version> <!-- 根据需要替换成最新版本 --> </dependency> </dependencies> </project>

这里故意把 calcite-elasticsearch 的代码注释掉了。

我们可以把这个模块下的代码全部复制出来,拷贝到自定义的模块中。

拷贝代码

对应的代码文件其实不多:

  [plaintext]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ElasticsearchAggregate.java ElasticsearchConstants.java ElasticsearchEnumerators.java ElasticsearchFilter.java ElasticsearchJson.java ElasticsearchMapping.java ElasticsearchMethod.java ElasticsearchProject.java ElasticsearchRel.java ElasticsearchRules.java ElasticsearchToEnumerableConverterRule.java ElasticsearchTransport.java ElasticsearchVersion.java MapProjectionFieldVisitor.java package-info.java PredicateAnalyzer.java QueryBuilders.java Scrolling.java

测试代码

这里主要演示一下 SELECT title, id FROM booksmapping WHERE title like '.*' 的可行性。具体的依赖其他文章中,此处不再赘述。

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
package com.github.houbb.es.define; import com.fasterxml.jackson.databind.ObjectMapper; import org.apache.calcite.adapter.elasticsearch.ElasticsearchSchema; import org.apache.calcite.jdbc.CalciteConnection; import org.apache.calcite.schema.SchemaPlus; import org.apache.calcite.schema.impl.ViewTable; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import java.sql.*; import java.util.Arrays; import java.util.Collections; import java.util.Map; import java.util.Properties; /** * https://www.tabnine.com/code/java/methods/org.apache.calcite.adapter.elasticsearch.ElasticsearchSchema/%3Cinit%3E * * 自动根据 mapping 生成 viewSql */ public class CalciteElasticsearchViewAutoByMappingLikeExample { public static void main(String[] args) throws Exception { RestClient restClient = null; try { // 1.构建ElasticsearchSchema对象,在Calcite中,不同数据源对应不同Schema,比如:CsvSchema、DruidSchema、ElasticsearchSchema等 restClient = RestClient.builder(new HttpHost("172.24.20.97", 9200)).build(); // 指定索引库 ElasticsearchSchema elasticsearchSchema = new ElasticsearchSchema(restClient, new ObjectMapper(), null); // 2.构建Connection // 2.1 设置连接参数 Properties info = new Properties(); // 不区分sql大小写 info.setProperty("caseSensitive", "false"); // 2.2 获取标准的JDBC Connection Connection connection = DriverManager.getConnection("jdbc:calcite:", info); // 2.3 获取Calcite封装的Connection CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class); // 3.构建RootSchema,在Calcite中,RootSchema是所有数据源schema的parent,多个不同数据源schema可以挂在同一个RootSchema下 // 以实现查询不同数据源的目的 SchemaPlus root = calciteConnection.getRootSchema(); // 4.将不同数据源schema挂载到RootSchema,这里添加ElasticsearchSchema root.add("elastic", elasticsearchSchema); // 添加一个视图(如何自动实现?) final String viewSql = buildViewSql("booksmapping"); root.add("booksmapping", ViewTable.viewMacro(root, viewSql, Collections.singletonList("elastic"), Arrays.asList("elastic", "view"), false)); // 重新执行查询 // 5.执行SQL查询,通过SQL方式访问object对象实例 //condition // https://stackoverflow.com/questions/51860219/how-to-use-apache-calcite-like-regex showAll(calciteConnection); //https://github.com/apache/calcite/pull/1530 showLikeAll(calciteConnection); showLikeRegex(calciteConnection); } finally { restClient.close(); } } //SELECT cast(_MAP['author'] AS VARCHAR) AS "author", // cast(_MAP['price'] AS FLOAT) AS "price",cast(_MAP['publish_time'] AS VARCHAR) AS "publish_time", // cast(_MAP['description'] AS VARCHAR) AS "description",cast(_MAP['id'] AS VARCHAR) AS "id", // cast(_MAP['title'] AS VARCHAR) AS "title" FROM "elastic"."booksmapping" private static void showLikeAll(CalciteConnection calciteConnection) { try { System.out.println("LIKE ALL------------------------- "); String sql = "SELECT title, id FROM booksmapping WHERE title like '.*'"; Statement statement = calciteConnection.createStatement(); ResultSet resultSet = statement.executeQuery(sql); // 6.遍历打印查询结果集 System.out.println(ResultSetUtil.resultString(resultSet)); } catch (SQLException e) { throw new RuntimeException(e); } } private static void showLikeRegex(CalciteConnection calciteConnection) { try { System.out.println("LIKE Regex------------------------- "); String sql = "SELECT title, id, author FROM booksmapping WHERE author like 'Bruce.*'"; Statement statement = calciteConnection.createStatement(); ResultSet resultSet = statement.executeQuery(sql); // 6.遍历打印查询结果集 System.out.println(ResultSetUtil.resultString(resultSet)); } catch (SQLException e) { throw new RuntimeException(e); } } private static void showAll(CalciteConnection calciteConnection) { try { System.out.println("ALLLLLL------------------------- "); String sql = "SELECT * FROM booksmapping"; Statement statement = calciteConnection.createStatement(); ResultSet resultSet = statement.executeQuery(sql); // 6.遍历打印查询结果集 System.out.println(ResultSetUtil.resultString(resultSet)); } catch (SQLException e) { throw new RuntimeException(e); } } public static String buildViewSql(String indexName) { // 1. 获取所有的字段及其类别 Map<String, EsTypeDto> fieldTypeMap = EsGetMappingByClientExample.getEsMapping(indexName); //2. 循环构建结果 String sqlFormat = "SELECT %s FROM " + "\"elastic\".\"%s\""; //2.1 构建查询字段的映射 StringBuilder fieldBuilder = new StringBuilder(); for(String fieldName : fieldTypeMap.keySet()) { EsTypeDto esTypeDto = fieldTypeMap.get(fieldName); String fieldSqlType = convertToSQLType(esTypeDto.getType().toLowerCase()); fieldBuilder.append("cast(_MAP['") .append(fieldName).append("'] AS ") .append(fieldSqlType) .append(") AS \"") .append(fieldName).append("\","); } // 删除最后一个, fieldBuilder.deleteCharAt(fieldBuilder.length()-1); String sql = String.format(sqlFormat, fieldBuilder.toString(), indexName); System.out.println("--------------SQL: " + sql); return sql; } /** * ES 的类别转换为 SQL 中的建表类别 * @param esType ES 类别 * @return SQL 类别 */ public static String convertToSQLType(String esType) { switch (esType) { case "text": return "VARCHAR"; case "keyword": return "VARCHAR"; // Assuming keyword is similar to text case "float": return "FLOAT"; case "double": return "DOUBLE"; case "integer": return "INT"; case "long": return "BIGINT"; //TODO 这里先从 TIMESTAMP=>varchar 试一下 case "date": return "VARCHAR"; case "boolean": return "BOOLEAN"; case "object": return "JSON"; // This can vary depending on SQL database case "nested": return "JSON"; // Nested objects can be represented as JSON case "geo_point": return "GEOGRAPHY"; // SQL spatial data type case "ip": return "VARCHAR"; // Assuming IP address is stored as string case "binary": return "BLOB"; // Binary large object case "completion": return "VARCHAR"; // Assuming completion is stored as string case "token_count": return "INT"; // Assuming token count is stored as integer default: return "VARCHAR"; // Default to VARCHAR if type is unknown } } }

测试效果

  [plaintext]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
--------------SQL: SELECT cast(_MAP['author'] AS VARCHAR) AS "author",cast(_MAP['price'] AS FLOAT) AS "price",cast(_MAP['publish_time'] AS VARCHAR) AS "publish_time",cast(_MAP['description'] AS VARCHAR) AS "description",cast(_MAP['id'] AS VARCHAR) AS "id",cast(_MAP['title'] AS VARCHAR) AS "title" FROM "elastic"."booksmapping" ALLLLLL------------------------- 二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse 警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] Bruce Eckel, 70.2, 2007-10-01, Java学习必读经典,殿堂级著作,赢得了全球程序员的广泛赞誉, 1, Java编程思想 葛一鸣, 46.5, 2012-08-01, 让你的Java程序更快,更稳定。深入剖析软件层面,代码层面,JVM虚拟机层面的优化方法, 2, Java程序性能优化 张惹愚, 81.4, 2016-05-01, 零基础学Python,光盘中作者独家整合开发winPython环境,涵盖了Python各个扩展库, 3, Python科学计算 Helant, 54.5, 2014-03-01, 经典Python入门教程,层次鲜明,结构严谨,内容翔实, 4, Python基础教程 Nicholas C. Zakas, 66.4, 2012-10-01, JavaScript经典名著, 5, JavaScript高级程序设计 LIKE ALL------------------------- 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] Java编程思想, 1 Java程序性能优化, 2 Python科学计算, 3 Python基础教程, 4 JavaScript高级程序设计, 5 LIKE Regex------------------------- 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] 二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse 警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."] Java编程思想, 1, Bruce Eckel

源码要如何修改呢?

默认 ES 是不支持 LIKE 关键词的。

需要修改的有两个类:

QueryBuilders

PredicateAnalyzer

PredicateAnalyzer

  [java]
1
2
3
case LIKE: return QueryExpression.create(pair.getKey()).like(pair.getValue()); // throw new UnsupportedOperationException("LIKE not yet supported v-define!!!!");

把原来报错的 UnsupportedOperationException 改成 like。

QueryBuilders

like 策略的 writeJson 默认也是不支持的。

我们修改一下 RegexpQueryBuilder#writeJson 方法,默认这里也是空的。

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/** * A Query that does fuzzy matching for a specific value. */ static class RegexpQueryBuilder extends QueryBuilder { @SuppressWarnings("unused") private final String fieldName; @SuppressWarnings("unused") private final String value; RegexpQueryBuilder(final String fieldName, final String value) { this.fieldName = fieldName; this.value = value; } @Override void writeJson(final JsonGenerator generator) throws IOException { generator.writeStartObject(); generator.writeFieldName("regexp"); generator.writeStartObject(); generator.writeFieldName(fieldName); writeObject(generator, value); generator.writeEndObject(); generator.writeEndObject(); } }

小结

整体来说不是很难,但是需要同时理解 ES 和 calcite。

官方这个地方为什么默认不支持呢?

这个写了邮件问暂时也没有回复。

参考资料

https://calcite.apache.org/docs/elasticsearch_adapter.html