Calcite使用SQL實(shí)現(xiàn)查詢(xún)excel內(nèi)容
1. 簡(jiǎn)介
我們?cè)谇懊娴奈恼轮刑岬搅薱alcite支持csv和json文件的數(shù)據(jù)源適配, 其實(shí)就是將文件解析成表然后以文件夾為schema, 然后將生成的schema注冊(cè)到RootSehema(RootSchema是所有數(shù)據(jù)源schema的parent,多個(gè)不同數(shù)據(jù)源schema可以?huà)煸谕粋€(gè)RootSchema下)
下, 最終使用calcite的特性進(jìn)行sql的解析查詢(xún)返回.
但其實(shí)我們的數(shù)據(jù)文件一般使用excel進(jìn)行存儲(chǔ),流轉(zhuǎn), 但很可惜, calcite本身沒(méi)有excel的適配器, 但其實(shí)我們可以模仿calcite-file
, 自己搞一個(gè)calcite-file-excel
, 也可以熟悉calcite的工作原理.
2. 實(shí)現(xiàn)思路
因?yàn)閑xcel有sheet的概念, 所以可以將一個(gè)excel解析成schema, 每個(gè)sheet解析成table, 實(shí)現(xiàn)步驟如下:
- 實(shí)現(xiàn)
SchemaFactory
重寫(xiě)create方法: schema工廠 用于創(chuàng)建schema - 繼承
AbstractSchema
: schema描述類(lèi) 用于解析excel, 創(chuàng)建table(解析sheet) - 繼承
AbstractTable, ScannableTable
: table描述類(lèi) 提供字段信息和數(shù)據(jù)內(nèi)容等(解析sheet data)
3. Excel樣例
excel有兩個(gè)sheet頁(yè), 分別是user_info
和 role_info
如下:
ok, 萬(wàn)事具備.
4. Maven
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>5.2.3</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>5.2.3</version> </dependency> <dependency> <groupId>org.apache.calcite</groupId> <artifactId>calcite-core</artifactId> <version>1.37.0</version> </dependency>
5. 核心代碼
5.1 SchemaFactory
package com.ldx.calcite.excel; import com.google.common.collect.Lists; import org.apache.calcite.schema.Schema; import org.apache.calcite.schema.SchemaFactory; import org.apache.calcite.schema.SchemaPlus; import org.apache.commons.lang3.ObjectUtils; import org.apache.commons.lang3.StringUtils; import java.io.File; import java.util.List; import java.util.Map; /** * schema factory */ public class ExcelSchemaFactory implements SchemaFactory { public final static ExcelSchemaFactory INSTANCE = new ExcelSchemaFactory(); private ExcelSchemaFactory(){} @Override public Schema create(SchemaPlus parentSchema, String name, Map<String, Object> operand) { final Object filePath = operand.get("filePath"); if (ObjectUtils.isEmpty(filePath)) { throw new NullPointerException("can not find excel file"); } return this.create(filePath.toString()); } public Schema create(String excelFilePath) { if (StringUtils.isBlank(excelFilePath)) { throw new NullPointerException("can not find excel file"); } return this.create(new File(excelFilePath)); } public Schema create(File excelFile) { if (ObjectUtils.isEmpty(excelFile) || !excelFile.exists()) { throw new NullPointerException("can not find excel file"); } if (!excelFile.isFile() || !isExcelFile(excelFile)) { throw new RuntimeException("can not find excel file: " + excelFile.getAbsolutePath()); } return new ExcelSchema(excelFile); } protected List<String> supportedFileSuffix() { return Lists.newArrayList("xls", "xlsx"); } private boolean isExcelFile(File excelFile) { if (ObjectUtils.isEmpty(excelFile)) { return false; } final String name = excelFile.getName(); return StringUtils.endsWithAny(name, this.supportedFileSuffix().toArray(new String[0])); } }
schema中有多個(gè)重載的create方法用于方便的創(chuàng)建schema, 最終將excel file 交給ExcelSchema
創(chuàng)建一個(gè)schema對(duì)象
5.2 Schema
package com.ldx.calcite.excel; import org.apache.calcite.schema.Table; import org.apache.calcite.schema.impl.AbstractSchema; import org.apache.commons.lang3.ObjectUtils; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.testng.collections.Maps; import java.io.File; import java.util.Iterator; import java.util.Map; /** * schema */ public class ExcelSchema extends AbstractSchema { private final File excelFile; private Map<String, Table> tableMap; public ExcelSchema(File excelFile) { this.excelFile = excelFile; } @Override protected Map<String, Table> getTableMap() { if (ObjectUtils.isEmpty(tableMap)) { tableMap = createTableMap(); } return tableMap; } private Map<String, Table> createTableMap() { final Map<String, Table> result = Maps.newHashMap(); try (Workbook workbook = WorkbookFactory.create(excelFile)) { final Iterator<Sheet> sheetIterator = workbook.sheetIterator(); while (sheetIterator.hasNext()) { final Sheet sheet = sheetIterator.next(); final ExcelScannableTable excelScannableTable = new ExcelScannableTable(sheet, null); result.put(sheet.getSheetName(), excelScannableTable); } } catch (Exception ignored) {} return result; } }
schema類(lèi)讀取Excel file, 并循環(huán)讀取sheet, 將每個(gè)sheet解析成ExcelScannableTable
并存儲(chǔ)
5.3 Table
package com.ldx.calcite.excel; import com.google.common.collect.Lists; import com.ldx.calcite.excel.enums.JavaFileTypeEnum; import org.apache.calcite.DataContext; import org.apache.calcite.adapter.java.JavaTypeFactory; import org.apache.calcite.linq4j.Enumerable; import org.apache.calcite.linq4j.Linq4j; import org.apache.calcite.rel.type.RelDataType; import org.apache.calcite.rel.type.RelDataTypeFactory; import org.apache.calcite.rel.type.RelProtoDataType; import org.apache.calcite.schema.ScannableTable; import org.apache.calcite.schema.impl.AbstractTable; import org.apache.calcite.sql.type.SqlTypeName; import org.apache.calcite.util.Pair; import org.apache.commons.lang3.ObjectUtils; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.checkerframework.checker.nullness.qual.Nullable; import java.util.List; /** * table */ public class ExcelScannableTable extends AbstractTable implements ScannableTable { private final RelProtoDataType protoRowType; private final Sheet sheet; private RelDataType rowType; private List<JavaFileTypeEnum> fieldTypes; private List<Object[]> rowDataList; public ExcelScannableTable(Sheet sheet, RelProtoDataType protoRowType) { this.protoRowType = protoRowType; this.sheet = sheet; } @Override public Enumerable<@Nullable Object[]> scan(DataContext root) { JavaTypeFactory typeFactory = root.getTypeFactory(); final List<JavaFileTypeEnum> fieldTypes = this.getFieldTypes(typeFactory); if (rowDataList == null) { rowDataList = readExcelData(sheet, fieldTypes); } return Linq4j.asEnumerable(rowDataList); } @Override public RelDataType getRowType(RelDataTypeFactory typeFactory) { if (ObjectUtils.isNotEmpty(protoRowType)) { return protoRowType.apply(typeFactory); } if (ObjectUtils.isEmpty(rowType)) { rowType = deduceRowType((JavaTypeFactory) typeFactory, sheet, null); } return rowType; } public List<JavaFileTypeEnum> getFieldTypes(RelDataTypeFactory typeFactory) { if (fieldTypes == null) { fieldTypes = Lists.newArrayList(); deduceRowType((JavaTypeFactory) typeFactory, sheet, fieldTypes); } return fieldTypes; } private List<Object[]> readExcelData(Sheet sheet, List<JavaFileTypeEnum> fieldTypes) { List<Object[]> rowDataList = Lists.newArrayList(); for (int rowIndex = 1; rowIndex <= sheet.getLastRowNum(); rowIndex++) { Row row = sheet.getRow(rowIndex); Object[] rowData = new Object[fieldTypes.size()]; for (int i = 0; i < row.getLastCellNum(); i++) { final JavaFileTypeEnum javaFileTypeEnum = fieldTypes.get(i); Cell cell = row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK); final Object cellValue = javaFileTypeEnum.getCellValue(cell); rowData[i] = cellValue; } rowDataList.add(rowData); } return rowDataList; } public static RelDataType deduceRowType(JavaTypeFactory typeFactory, Sheet sheet, List<JavaFileTypeEnum> fieldTypes) { final List<String> names = Lists.newArrayList(); final List<RelDataType> types = Lists.newArrayList(); if (sheet != null) { Row headerRow = sheet.getRow(0); if (headerRow != null) { for (int i = 0; i < headerRow.getLastCellNum(); i++) { Cell cell = headerRow.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK); String[] columnInfo = cell .getStringCellValue() .split(":"); String columnName = columnInfo[0].trim(); String columnType = null; if (columnInfo.length == 2) { columnType = columnInfo[1].trim(); } final JavaFileTypeEnum javaFileType = JavaFileTypeEnum .of(columnType) .orElse(JavaFileTypeEnum.UNKNOWN); final RelDataType sqlType = typeFactory.createSqlType(javaFileType.getSqlTypeName()); names.add(columnName); types.add(sqlType); if (fieldTypes != null) { fieldTypes.add(javaFileType); } } } } if (names.isEmpty()) { names.add("line"); types.add(typeFactory.createSqlType(SqlTypeName.VARCHAR)); } return typeFactory.createStructType(Pair.zip(names, types)); } }
table類(lèi)中其中有兩個(gè)比較關(guān)鍵的方法
? scan
: 掃描表內(nèi)容, 我們這里將sheet頁(yè)面的數(shù)據(jù)內(nèi)容解析存儲(chǔ)最后交給calcite
? getRowType
: 獲取字段信息, 我們這里默認(rèn)使用第一條記錄作為表頭(row[0]) 并解析為字段信息, 字段規(guī)則跟csv一樣 name:string
, 冒號(hào)前面的是字段key, 冒號(hào)后面的是字段類(lèi)型, 如果未指定字段類(lèi)型, 則解析為UNKNOWN
, 后續(xù)JavaFileTypeEnum
會(huì)進(jìn)行類(lèi)型推斷, 最終在結(jié)果處理時(shí)calcite也會(huì)進(jìn)行推斷
deduceRowType
: 推斷字段類(lèi)型, 方法中使用JavaFileTypeEnum
枚舉類(lèi)對(duì)java type & sql type & 字段值轉(zhuǎn)化處理方法 進(jìn)行管理
5.4 ColumnTypeEnum
package com.ldx.calcite.excel.enums; import lombok.Getter; import lombok.extern.slf4j.Slf4j; import org.apache.calcite.avatica.util.DateTimeUtils; import org.apache.calcite.sql.type.SqlTypeName; import org.apache.commons.lang3.ObjectUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.time.FastDateFormat; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.DateUtil; import org.apache.poi.ss.util.CellUtil; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Arrays; import java.util.Date; import java.util.Optional; import java.util.TimeZone; import java.util.function.Function; /** * type converter */ @Slf4j @Getter public enum JavaFileTypeEnum { STRING("string", SqlTypeName.VARCHAR, Cell::getStringCellValue), BOOLEAN("boolean", SqlTypeName.BOOLEAN, Cell::getBooleanCellValue), BYTE("byte", SqlTypeName.TINYINT, Cell::getStringCellValue), CHAR("char", SqlTypeName.CHAR, Cell::getStringCellValue), SHORT("short", SqlTypeName.SMALLINT, Cell::getNumericCellValue), INT("int", SqlTypeName.INTEGER, cell -> (Double.valueOf(cell.getNumericCellValue()).intValue())), LONG("long", SqlTypeName.BIGINT, cell -> (Double.valueOf(cell.getNumericCellValue()).longValue())), FLOAT("float", SqlTypeName.REAL, Cell::getNumericCellValue), DOUBLE("double", SqlTypeName.DOUBLE, Cell::getNumericCellValue), DATE("date", SqlTypeName.DATE, getValueWithDate()), TIMESTAMP("timestamp", SqlTypeName.TIMESTAMP, getValueWithTimestamp()), TIME("time", SqlTypeName.TIME, getValueWithTime()), UNKNOWN("unknown", SqlTypeName.UNKNOWN, getValueWithUnknown()),; // cell type private final String typeName; // sql type private final SqlTypeName sqlTypeName; // value convert func private final Function<Cell, Object> cellValueFunc; private static final FastDateFormat TIME_FORMAT_DATE; private static final FastDateFormat TIME_FORMAT_TIME; private static final FastDateFormat TIME_FORMAT_TIMESTAMP; static { final TimeZone gmt = TimeZone.getTimeZone("GMT"); TIME_FORMAT_DATE = FastDateFormat.getInstance("yyyy-MM-dd", gmt); TIME_FORMAT_TIME = FastDateFormat.getInstance("HH:mm:ss", gmt); TIME_FORMAT_TIMESTAMP = FastDateFormat.getInstance("yyyy-MM-dd HH:mm:ss", gmt); } JavaFileTypeEnum(String typeName, SqlTypeName sqlTypeName, Function<Cell, Object> cellValueFunc) { this.typeName = typeName; this.sqlTypeName = sqlTypeName; this.cellValueFunc = cellValueFunc; } public static Optional<JavaFileTypeEnum> of(String typeName) { return Arrays .stream(values()) .filter(type -> StringUtils.equalsIgnoreCase(typeName, type.getTypeName())) .findFirst(); } public static SqlTypeName findSqlTypeName(String typeName) { final Optional<JavaFileTypeEnum> javaFileTypeOptional = of(typeName); if (javaFileTypeOptional.isPresent()) { return javaFileTypeOptional .get() .getSqlTypeName(); } return SqlTypeName.UNKNOWN; } public Object getCellValue(Cell cell) { return cellValueFunc.apply(cell); } public static Function<Cell, Object> getValueWithUnknown() { return cell -> { if (ObjectUtils.isEmpty(cell)) { return null; } switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: if (DateUtil.isCellDateFormatted(cell)) { // 如果是日期類(lèi)型,返回日期對(duì)象 return cell.getDateCellValue(); } else { // 否則返回?cái)?shù)值 return cell.getNumericCellValue(); } case BOOLEAN: return cell.getBooleanCellValue(); case FORMULA: // 對(duì)于公式單元格,先計(jì)算公式結(jié)果,再獲取其值 try { return cell.getNumericCellValue(); } catch (Exception e) { try { return cell.getStringCellValue(); } catch (Exception ex) { log.error("parse unknown data error, cellRowIndex:{}, cellColumnIndex:{}", cell.getRowIndex(), cell.getColumnIndex(), e); return null; } } case BLANK: return ""; default: return null; } }; } public static Function<Cell, Object> getValueWithDate() { return cell -> { Date date = cell.getDateCellValue(); if(ObjectUtils.isEmpty(date)) { return null; } try { final String formated = new SimpleDateFormat("yyyy-MM-dd").format(date); Date newDate = TIME_FORMAT_DATE.parse(formated); return (int) (newDate.getTime() / DateTimeUtils.MILLIS_PER_DAY); } catch (ParseException e) { log.error("parse date error, date:{}", date, e); } return null; }; } public static Function<Cell, Object> getValueWithTimestamp() { return cell -> { Date date = cell.getDateCellValue(); if(ObjectUtils.isEmpty(date)) { return null; } try { final String formated = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(date); Date newDate = TIME_FORMAT_TIMESTAMP.parse(formated); return (int) newDate.getTime(); } catch (ParseException e) { log.error("parse timestamp error, date:{}", date, e); } return null; }; } public static Function<Cell, Object> getValueWithTime() { return cell -> { Date date = cell.getDateCellValue(); if(ObjectUtils.isEmpty(date)) { return null; } try { final String formated = new SimpleDateFormat("HH:mm:ss").format(date); Date newDate = TIME_FORMAT_TIME.parse(formated); return newDate.getTime(); } catch (ParseException e) { log.error("parse time error, date:{}", date, e); } return null; }; } }
該枚舉類(lèi)主要管理了java type
& sql type
& cell value convert func
, 方便統(tǒng)一管理類(lèi)型映射及單元格內(nèi)容提取時(shí)的轉(zhuǎn)換方法(這里借用了java8 function函數(shù)特性)
注: 這里的日期轉(zhuǎn)換只能這樣寫(xiě), 即使用GMT的時(shí)區(qū)(抄的calcite-file
), 要不然輸出的日期時(shí)間一直有時(shí)差...
6. 測(cè)試查詢(xún)
package com.ldx.calcite; import com.ldx.calcite.excel.ExcelSchemaFactory; import lombok.SneakyThrows; import lombok.extern.slf4j.Slf4j; import org.apache.calcite.config.CalciteConnectionProperty; import org.apache.calcite.jdbc.CalciteConnection; import org.apache.calcite.schema.Schema; import org.apache.calcite.schema.SchemaPlus; import org.apache.calcite.util.Sources; import org.junit.jupiter.api.AfterAll; import org.junit.jupiter.api.BeforeAll; import org.junit.jupiter.api.Test; import org.testng.collections.Maps; import java.net.URL; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Statement; import java.util.Map; import java.util.Properties; @Slf4j public class CalciteExcelTest { private static Connection connection; private static SchemaPlus rootSchema; private static CalciteConnection calciteConnection; @BeforeAll @SneakyThrows public static void beforeAll() { Properties info = new Properties(); // 不區(qū)分sql大小寫(xiě) info.setProperty(CalciteConnectionProperty.CASE_SENSITIVE.camelName(), "false"); // 創(chuàng)建Calcite連接 connection = DriverManager.getConnection("jdbc:calcite:", info); calciteConnection = connection.unwrap(CalciteConnection.class); // 構(gòu)建RootSchema,在Calcite中,RootSchema是所有數(shù)據(jù)源schema的parent,多個(gè)不同數(shù)據(jù)源schema可以?huà)煸谕粋€(gè)RootSchema下 rootSchema = calciteConnection.getRootSchema(); } @Test @SneakyThrows public void test_execute_query() { final Schema schema = ExcelSchemaFactory.INSTANCE.create(resourcePath("file/test.xlsx")); rootSchema.add("test", schema); // 設(shè)置默認(rèn)的schema calciteConnection.setSchema("test"); final Statement statement = calciteConnection.createStatement(); ResultSet resultSet = statement.executeQuery("SELECT * FROM user_info"); printResultSet(resultSet); System.out.println("========="); ResultSet resultSet2 = statement.executeQuery("SELECT * FROM test.user_info where id > 110 and birthday > '2003-01-01'"); printResultSet(resultSet2); System.out.println("========="); ResultSet resultSet3 = statement.executeQuery("SELECT * FROM test.user_info ui inner join test.role_info ri on ui.role_id = ri.id"); printResultSet(resultSet3); } @AfterAll @SneakyThrows public static void closeResource() { connection.close(); } private static String resourcePath(String path) { final URL url = CalciteExcelTest.class.getResource("/" + path); return Sources.of(url).file().getAbsolutePath(); } public static void printResultSet(ResultSet resultSet) throws SQLException { // 獲取 ResultSet 元數(shù)據(jù) ResultSetMetaData metaData = resultSet.getMetaData(); // 獲取列數(shù) int columnCount = metaData.getColumnCount(); log.info("Number of columns: {}",columnCount); // 遍歷 ResultSet 并打印結(jié)果 while (resultSet.next()) { final Map<String, String> item = Maps.newHashMap(); // 遍歷每一列并打印 for (int i = 1; i <= columnCount; i++) { String columnName = metaData.getColumnName(i); String columnValue = resultSet.getString(i); item.put(columnName, columnValue); } log.info(item.toString()); } } }
測(cè)試結(jié)果如下:
以上就是Calcite使用SQL實(shí)現(xiàn)查詢(xún)excel內(nèi)容的詳細(xì)內(nèi)容,更多關(guān)于Calcite SQL查詢(xún)excel的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- java把excel內(nèi)容上傳到mysql實(shí)例代碼
- Java將excel中的數(shù)據(jù)導(dǎo)入到mysql中
- Apache Calcite 實(shí)現(xiàn)方言轉(zhuǎn)換的代碼
- 教你使用java將excel數(shù)據(jù)導(dǎo)入MySQL
- Apache Calcite進(jìn)行SQL解析(java代碼實(shí)例)
- Java excel數(shù)據(jù)導(dǎo)入mysql的實(shí)現(xiàn)示例詳解
- Java實(shí)現(xiàn)根據(jù)sql動(dòng)態(tài)查詢(xún)并下載數(shù)據(jù)到excel
相關(guān)文章
Java如何使用Iterator迭代器刪除集合重復(fù)選項(xiàng)
這篇文章主要介紹了Java如何使用Iterator迭代器刪除集合重復(fù)選項(xiàng),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-02-02Java+Swing實(shí)現(xiàn)中國(guó)象棋游戲
這篇文章將通過(guò)Java+Swing實(shí)現(xiàn)經(jīng)典的中國(guó)象棋游戲。文中可以實(shí)現(xiàn)開(kāi)始游戲,悔棋,退出等功能。感興趣的小伙伴可以跟隨小編一起動(dòng)手試一試2022-02-02Maven打包時(shí)如何指定啟動(dòng)類(lèi)
這篇文章主要介紹了Maven打包時(shí)如何指定啟動(dòng)類(lèi)問(wèn)題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-04-04Java編程使用UDP建立群聊系統(tǒng)代碼實(shí)例
這篇文章主要介紹了Java編程使用UDP建立群聊系統(tǒng)代碼實(shí)例,具有一定借鑒價(jià)值,需要的朋友可以參考下。2018-01-01Spring框架基于注解的AOP之各種通知的使用與環(huán)繞通知實(shí)現(xiàn)詳解
這篇文章主要介紹了Spring框架基于注解的AOP之各種通知的使用及其環(huán)繞通知,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)吧2022-11-11詳解Spring Security如何在權(quán)限中使用通配符
小伙伴們知道,在Shiro中,默認(rèn)是支持權(quán)限通配符的?,F(xiàn)在給用戶(hù)授權(quán)的時(shí)候,可以一個(gè)權(quán)限一個(gè)權(quán)限的配置,也可以直接用通配符。本文將介紹Spring Security如何在權(quán)限中使用通配符,需要的可以參考一下2022-06-06關(guān)于MVC的dao層、service層和controller層詳解
這篇文章主要介紹了關(guān)于MVC的dao層、service層和controller層詳解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2022-02-02