说明

apache calcite 在 v1.36.0 中,默认 ES 是不支持 like 的,同时 contains 必须要求是 date 类型。

我们有时候希望实现一个类似于 like 或者说 regex 的实现,这种常见需求就会变得比较麻烦。

这里演示一下一种可行的实现方式,作为参考。

需要修改源码。

ES 数据准备

数据结构

$ curl -X GET "localhost:9200/booksmapping"

{"booksmapping":{"aliases":{},"mappings":{"properties":{"author":{"type":"keyword"},"description":{"type":"text"},"id":{"type":"keyword"},"price":{"type":"float"},"publish_time":{"type":"date","format":"yyyy-MM-dd"},"title":{"type":"text","analyzer":"standard"}}},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"booksmapping","creation_date":"1706773773494","number_of_replicas":"1","uuid":"gxDMFnVxR4CrCevzCTEuXQ","version":{"created":"7150099"}}}}}d

数据本身

$ curl -X GET "localhost:9200/booksmapping/_search"

{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"booksmapping","_type":"_doc","_id":"1","_score":1.0,"_source":{ "id" : "1", "title" : "Java编程思想", "author" : "Bruce Eckel", "price": 70.20, "publish_time" : "2007-10-01", "description" : "Java学习必读经典,殿堂级著作,赢得了全球程序员的广泛赞誉" }},{"_index":"booksmapping","_type":"_doc","_id":"2","_score":1.0,"_source":{ "id" : "2", "title" : "Java程序性能优化", "author" : "葛一鸣", "price": 46.50, "publish_time" : "2012-08-01", "description" : "让你的Java程序更快,更稳定。深入剖析软件层面,代码层面,JVM虚拟机层面的优化方法" }},{"_index":"booksmapping","_type":"_doc","_id":"3","_score":1.0,"_source":{ "id" : "3", "title" : "Python科学计算", "author" : "张惹愚", "price": 81.40, "publish_time" : "2016-05-01", "description" : "零基础学Python,光盘中作者独家整合开发winPython环境, 涵盖了Python各个扩展库" }},{"_index":"booksmapping","_type":"_doc","_id":"4","_score":1.0,"_source":{ "id" : "4", "title" : "Python基础教程", "author" : "Helant", "price": 54.50, "publish_time" : "2014-03-01", "description" : "经典Python入 门教程,层次鲜明,结构严谨,内容翔实" }},{"_index":"booksmapping","_type":"_doc","_id":"5","_score":1.0,"_source":{ "id" : "5", "title" : "JavaScript高级程序设计", "author" : "Nicholas C. Zakas", "price": 66.40, "publish_time" : "2012-10-01", "description" : "JavaScript经典名著" }}]}}

测试验证

maven 依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.example</groupId>
        <artifactId>calcite-learn</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <artifactId>calcite-learn-es-define</artifactId>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <calcite.version>1.36.0</calcite.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-core</artifactId>
            <version>${calcite.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.calcite/calcite-elasticsearch -->
<!--        <dependency>-->
<!--            <groupId>org.apache.calcite</groupId>-->
<!--            <artifactId>calcite-elasticsearch</artifactId>-->
<!--            <version>${calcite.version}</version>-->
<!--        </dependency>-->

        <!-- https://mvnrepository.com/artifact/org.apache.calcite/calcite-avatica -->
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-avatica</artifactId>
            <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.protobuf</groupId>
            <artifactId>protobuf-java</artifactId>
            <version>3.17.3</version> <!-- 使用最新的 protobuf 版本 -->
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.15.0</version> <!-- 请根据你的 Elasticsearch 版本选择相应版本 -->
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.78</version> <!-- 根据需要替换成最新版本 -->
        </dependency>

    </dependencies>


</project>

这里故意把 calcite-elasticsearch 的代码注释掉了。

我们可以把这个模块下的代码全部复制出来,拷贝到自定义的模块中。

拷贝代码

对应的代码文件其实不多:

    ElasticsearchAggregate.java
    ElasticsearchConstants.java
    ElasticsearchEnumerators.java
    ElasticsearchFilter.java
    ElasticsearchJson.java
    ElasticsearchMapping.java
    ElasticsearchMethod.java
    ElasticsearchProject.java
    ElasticsearchRel.java
    ElasticsearchRules.java
    ElasticsearchToEnumerableConverterRule.java
    ElasticsearchTransport.java
    ElasticsearchVersion.java
    MapProjectionFieldVisitor.java
    package-info.java
    PredicateAnalyzer.java
    QueryBuilders.java
    Scrolling.java

测试代码

这里主要演示一下 SELECT title, id FROM booksmapping WHERE title like '.*' 的可行性。具体的依赖其他文章中,此处不再赘述。

package com.github.houbb.es.define;

import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.calcite.adapter.elasticsearch.ElasticsearchSchema;
import org.apache.calcite.jdbc.CalciteConnection;
import org.apache.calcite.schema.SchemaPlus;
import org.apache.calcite.schema.impl.ViewTable;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

import java.sql.*;
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.Properties;


/**
 * https://www.tabnine.com/code/java/methods/org.apache.calcite.adapter.elasticsearch.ElasticsearchSchema/%3Cinit%3E
 *
 * 自动根据 mapping 生成 viewSql
 */
public class CalciteElasticsearchViewAutoByMappingLikeExample {

    public static void main(String[] args) throws Exception {
        RestClient restClient = null;
        try {
            // 1.构建ElasticsearchSchema对象,在Calcite中,不同数据源对应不同Schema,比如:CsvSchema、DruidSchema、ElasticsearchSchema等
            restClient = RestClient.builder(new HttpHost("172.24.20.97", 9200)).build();
            // 指定索引库
            ElasticsearchSchema elasticsearchSchema = new ElasticsearchSchema(restClient, new ObjectMapper(), null);

            // 2.构建Connection
            // 2.1 设置连接参数
            Properties info = new Properties();
            // 不区分sql大小写
            info.setProperty("caseSensitive", "false");

            // 2.2 获取标准的JDBC Connection
            Connection connection = DriverManager.getConnection("jdbc:calcite:", info);
            // 2.3 获取Calcite封装的Connection
            CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);

            // 3.构建RootSchema,在Calcite中,RootSchema是所有数据源schema的parent,多个不同数据源schema可以挂在同一个RootSchema下
            // 以实现查询不同数据源的目的
            SchemaPlus root = calciteConnection.getRootSchema();

            // 4.将不同数据源schema挂载到RootSchema,这里添加ElasticsearchSchema
            root.add("elastic", elasticsearchSchema);

            // 添加一个视图(如何自动实现?)
            final String viewSql = buildViewSql("booksmapping");
            root.add("booksmapping",
                    ViewTable.viewMacro(root, viewSql,
                            Collections.singletonList("elastic"),
                            Arrays.asList("elastic", "view"), false));

            // 重新执行查询
            // 5.执行SQL查询,通过SQL方式访问object对象实例
            //condition
            // https://stackoverflow.com/questions/51860219/how-to-use-apache-calcite-like-regex
            showAll(calciteConnection);

            //https://github.com/apache/calcite/pull/1530
            showLikeAll(calciteConnection);
            showLikeRegex(calciteConnection);
        } finally {
            restClient.close();
        }
    }


    //SELECT cast(_MAP['author'] AS VARCHAR) AS "author",
    // cast(_MAP['price'] AS FLOAT) AS "price",cast(_MAP['publish_time'] AS VARCHAR) AS "publish_time",
    // cast(_MAP['description'] AS VARCHAR) AS "description",cast(_MAP['id'] AS VARCHAR) AS "id",
    // cast(_MAP['title'] AS VARCHAR) AS "title" FROM "elastic"."booksmapping"
    private static void showLikeAll(CalciteConnection calciteConnection) {
        try {
            System.out.println("LIKE ALL------------------------- ");
            String sql = "SELECT title, id FROM booksmapping WHERE title like '.*'";
            Statement statement = calciteConnection.createStatement();
            ResultSet resultSet = statement.executeQuery(sql);
            // 6.遍历打印查询结果集
            System.out.println(ResultSetUtil.resultString(resultSet));
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
    }

    private static void showLikeRegex(CalciteConnection calciteConnection) {
        try {
            System.out.println("LIKE Regex------------------------- ");
            String sql = "SELECT title, id, author FROM booksmapping WHERE author like 'Bruce.*'";
            Statement statement = calciteConnection.createStatement();
            ResultSet resultSet = statement.executeQuery(sql);
            // 6.遍历打印查询结果集
            System.out.println(ResultSetUtil.resultString(resultSet));
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
    }

    private static void showAll(CalciteConnection calciteConnection) {
        try {
            System.out.println("ALLLLLL------------------------- ");
            String sql = "SELECT * FROM booksmapping";
            Statement statement = calciteConnection.createStatement();
            ResultSet resultSet = statement.executeQuery(sql);
            // 6.遍历打印查询结果集
            System.out.println(ResultSetUtil.resultString(resultSet));
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
    }

    public static String buildViewSql(String indexName) {
        // 1. 获取所有的字段及其类别
        Map<String, EsTypeDto> fieldTypeMap = EsGetMappingByClientExample.getEsMapping(indexName);

        //2. 循环构建结果
        String sqlFormat = "SELECT %s FROM " + "\"elastic\".\"%s\"";

        //2.1 构建查询字段的映射
        StringBuilder fieldBuilder = new StringBuilder();
        for(String fieldName : fieldTypeMap.keySet()) {
            EsTypeDto esTypeDto = fieldTypeMap.get(fieldName);

            String fieldSqlType = convertToSQLType(esTypeDto.getType().toLowerCase());

            fieldBuilder.append("cast(_MAP['")
                    .append(fieldName).append("'] AS ")
                    .append(fieldSqlType)
                    .append(") AS \"")
                    .append(fieldName).append("\",");
        }
        // 删除最后一个,
        fieldBuilder.deleteCharAt(fieldBuilder.length()-1);

        String sql =  String.format(sqlFormat, fieldBuilder.toString(), indexName);
        System.out.println("--------------SQL: " + sql);
        return sql;
    }

    /**
     * ES 的类别转换为 SQL 中的建表类别
     * @param esType ES 类别
     * @return SQL 类别
     */
    public static String convertToSQLType(String esType) {
        switch (esType) {
            case "text":
                return "VARCHAR";
            case "keyword":
                return "VARCHAR"; // Assuming keyword is similar to text
            case "float":
                return "FLOAT";
            case "double":
                return "DOUBLE";
            case "integer":
                return "INT";
            case "long":
                return "BIGINT";
            //TODO 这里先从    TIMESTAMP=>varchar 试一下
            case "date":
                return "VARCHAR";
            case "boolean":
                return "BOOLEAN";
            case "object":
                return "JSON"; // This can vary depending on SQL database
            case "nested":
                return "JSON"; // Nested objects can be represented as JSON
            case "geo_point":
                return "GEOGRAPHY"; // SQL spatial data type
            case "ip":
                return "VARCHAR"; // Assuming IP address is stored as string
            case "binary":
                return "BLOB"; // Binary large object
            case "completion":
                return "VARCHAR"; // Assuming completion is stored as string
            case "token_count":
                return "INT"; // Assuming token count is stored as integer
            default:
                return "VARCHAR"; // Default to VARCHAR if type is unknown
        }
    }

}

测试效果

--------------SQL: SELECT cast(_MAP['author'] AS VARCHAR) AS "author",cast(_MAP['price'] AS FLOAT) AS "price",cast(_MAP['publish_time'] AS VARCHAR) AS "publish_time",cast(_MAP['description'] AS VARCHAR) AS "description",cast(_MAP['id'] AS VARCHAR) AS "id",cast(_MAP['title'] AS VARCHAR) AS "title" FROM "elastic"."booksmapping"
ALLLLLL------------------------- 
二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:15:48 上午 org.elasticsearch.client.RestClient logResponse
警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
Bruce Eckel, 70.2, 2007-10-01, Java学习必读经典,殿堂级著作,赢得了全球程序员的广泛赞誉, 1, Java编程思想
葛一鸣, 46.5, 2012-08-01, 让你的Java程序更快,更稳定。深入剖析软件层面,代码层面,JVM虚拟机层面的优化方法, 2, Java程序性能优化
张惹愚, 81.4, 2016-05-01, 零基础学Python,光盘中作者独家整合开发winPython环境,涵盖了Python各个扩展库, 3, Python科学计算
Helant, 54.5, 2014-03-01, 经典Python入门教程,层次鲜明,结构严谨,内容翔实, 4, Python基础教程
Nicholas C. Zakas, 66.4, 2012-10-01, JavaScript经典名著, 5, JavaScript高级程序设计

LIKE ALL------------------------- 
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
Java编程思想, 1
Java程序性能优化, 2
Python科学计算, 3
Python基础教程, 4
JavaScript高级程序设计, 5

LIKE Regex------------------------- 
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/booksmapping/_search?scroll=1m] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [POST http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
二月 29, 2024 11:16:06 上午 org.elasticsearch.client.RestClient logResponse
警告: request [DELETE http://172.24.20.97:9200/_search/scroll] returned 1 warnings: [299 Elasticsearch-7.15.0-79d65f6e357953a5b3cbcc5e2c7c21073d89aa29 "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.15/security-minimal-setup.html to enable security."]
Java编程思想, 1, Bruce Eckel

源码要如何修改呢?

默认 ES 是不支持 LIKE 关键词的。

需要修改的有两个类:

QueryBuilders

PredicateAnalyzer

PredicateAnalyzer

case LIKE:
        return QueryExpression.create(pair.getKey()).like(pair.getValue());
//        throw new UnsupportedOperationException("LIKE not yet supported v-define!!!!");

把原来报错的 UnsupportedOperationException 改成 like。

QueryBuilders

like 策略的 writeJson 默认也是不支持的。

我们修改一下 RegexpQueryBuilder#writeJson 方法,默认这里也是空的。

/**
   * A Query that does fuzzy matching for a specific value.
   */
  static class RegexpQueryBuilder extends QueryBuilder {
    @SuppressWarnings("unused")
    private final String fieldName;
    @SuppressWarnings("unused")
    private final String value;

    RegexpQueryBuilder(final String fieldName, final String value) {
      this.fieldName = fieldName;
      this.value = value;
    }

    @Override void writeJson(final JsonGenerator generator) throws IOException {
      generator.writeStartObject();
      generator.writeFieldName("regexp");
      generator.writeStartObject();
      generator.writeFieldName(fieldName);
      writeObject(generator, value);
      generator.writeEndObject();
      generator.writeEndObject();
    }
  }

小结

整体来说不是很难,但是需要同时理解 ES 和 calcite。

官方这个地方为什么默认不支持呢?

这个写了邮件问暂时也没有回复。

参考资料

https://calcite.apache.org/docs/elasticsearch_adapter.html