Home > News > Internet

Huawei Leads with 17 Papers Selected for Top International Database Conference ICDE 2024

cici Tue, May 21 2024 07:30 PM EST

Recently, the prestigious international database conference ICDE 2024 took place in Utrecht, Netherlands. Huawei Cloud's GaussDB, GeminiDB, and 17 other papers in the field of data were selected, making Huawei the company with the highest number of papers selected globally. Nikolaos Ntarmos, Director of the Database Lab at Huawei's Edinburgh Research Institute, delivered a speech titled "Huawei Cloud GaussDB, a Better Way to Database," introducing the technical and commercial achievements of Huawei Cloud GaussDB to academic institutions and representatives worldwide. 3c6105db-034f-46a7-a31a-c0a6fe8829a6.png ICDE, short for "IEEE International Conference on Data Engineering," is one of the top international academic conferences in the field of databases, alongside SIGMOD and VLDB. It is highly esteemed internationally and holds significant academic influence.

ICDE showcases cutting-edge research achievements in the field of databases from major research institutions and technology companies. ICDE 2024, the 40th IEEE International Conference on Data Engineering, saw the inclusion of 17 papers from Huawei, a result of collaborative efforts between Huawei's research team and external teams or organizations. Below are excerpts from some of the papers selected by Huawei at this conference.

GaussML: An End-to-End In-database Machine Learning System

The paper "GaussML: An End-to-End In-database Machine Learning System" was jointly completed by Tsinghua University, Huawei, and ETH Zurich. By enhancing the performance of machine learning algorithm training and inference within databases, it effectively meets the demands for real-time analysis. The paper received high praise from the conference review committee for introducing a novel machine learning engine.

The native in-database machine learning framework, GaussML, treats machine learning training as an execution operator. Leveraging the database's parallel and distributed capabilities, it demonstrates over 10 times the performance advantage in machine learning inference and training compared to similar products in the industry. Key capabilities include:

  1. Introducing the architecture of a native AI machine learning engine within databases, integrating machine learning algorithm training and inference into the SQL execution process. This achieves utmost efficiency in machine learning training and inference by utilizing the database's optimizer, load management, concurrency processing, and distributed parallel capabilities.

  2. GaussML simultaneously builds in-database AutoML capabilities, enabling adaptive parameter adjustments and model corrections based on load changes. It designs an end-to-end model automatic tuning capability, simplifying the cost of adjusting model parameters for users and enhancing the convenience of using models within databases.

  3. By embedding native machine learning training and inference processes within databases, GaussML achieves end-to-end automatic tuning capabilities, constructing a complete in-database machine learning engine to support intelligent real-time analysis for customer businesses. The framework simplifies the cost for data scientists to use machine learning training and inference through providing SQL-like interfaces, supporting common machine learning algorithms, and meeting the needs of the majority of customers.

In conclusion, the paper introduces a novel machine learning engine that demonstrates outstanding high-performance advantages on multiple public datasets, taking a further step towards the intelligent development of databases.

GaussDB-Global: A Geographically Distributed Database System

The paper "GaussDB-Global: A Geographically Distributed Database System" is a research achievement of Huawei's technical team. It proposes a distributed transaction processing method based on high-precision clock synchronization and constructs the globally distributed database system GaussDB-Global. The main implementations of this paper include:

  1. Adopting a decentralized approach with synchronized clocks, the geographically distributed database system resolves the performance bottleneck of centralized transaction managers. It seamlessly transitions from centralized transaction management to decentralized distributed transaction management, providing a more flexible and convenient deployment method for a globally deployed cluster.

  2. Addressing the issues of remote reading of sharded data and long-distance log transmission, the geographically distributed database system supports reading on asynchronous replicas, strong consistency, adjustable freshness guarantees, and dynamic load balancing. Experimental results on cross-regional clusters show that compared to the centralized baseline, this method provides up to 14 times higher read-only performance, with over 50% higher throughput on the standard TPC-C dataset.

QCFE: An Efficient Feature Engineering for Query Cost Estimation

The paper "QCFE: An Efficient Feature Engineering for Query Cost Estimation" was jointly completed by Harbin Institute of Technology and Huawei. It introduces an efficient feature engineering method (QCFE) to address feature engineering issues in existing query statement evaluations, achieving significant improvements in time-accuracy efficiency. The main contributions of the paper include:

  1. Introducing the concept of Feature Snapshot to integrate the effects of ignored variables, such as database knobs, hardware, etc., to enhance the accuracy of the query cost model.

  2. Designing a differential propagation feature reduction method to prune ineffective features, further improving model training and inference efficiency.

  3. Introducing a simplified SQL template design to enhance the time efficiency of computing feature snapshots. In extensive benchmark tests, QCFE has demonstrated advantages in time-accuracy efficiency over existing methods, including TPC-H, job-light, and Sysbench.

In summary, the innovation of this study lies in proposing an effective feature engineering method that significantly improves the time and accuracy of query cost estimation.

TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation

The paper "TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation" is a collaborative research effort by Xiamen University, Tsinghua University, and Huawei. By introducing a workload generation framework TRAP based on adversarial perturbation, it addresses the robustness evaluation issue of existing index advisors.

The TRAP framework can generate effective adversarial workloads for assessing the robustness of index advisors. TRAP holds a clear advantage in the evaluation of index advisors. The research findings include:

Firstly, through the effective generation of adversarial workloads, the robustness of index advisors can be accurately assessed, as these workloads do not deviate from the original workloads but can identify performance vulnerabilities caused by workload drift.

Secondly, for designing more robust learning-based index advisors, adopting fine-grained state representations and candidate pruning strategies can enhance performance.

Thirdly, for designing more robust heuristic-based index advisors, considering index interactions and the use of multi-column indexes during the index selection process is crucial.

In conclusion, these findings provide profound insights for the design and evaluation of index advisors, emphasizing the importance of evaluating index advisors in practical applications.

Temporal-Frequency Masked Autoencoders for Time Series Anomaly Detection

The emergence of the paper "Temporal-Frequency Masked Autoencoders for Time Series Anomaly Detection" aims to help time-series databases reduce losses by proactively detecting anomalies. By designing a lightweight deep learning-based time-frequency masked autoencoder anomaly detection algorithm (TFMAE), it demonstrates good performance on multiple public datasets. The paper received high praise from the conference review committee, stating that it introduces a new paradigm for time-series anomaly detection and was ultimately directly accepted by ICDE 2024 without modifications.

As the first paper to use time-frequency masking for time-series anomaly detection, the research focuses on the following three points:

Firstly, it proposes a time-domain and frequency-domain masked time series anomaly detection comparison criterion, which replaces traditional reconstruction errors to determine anomaly detection thresholds, providing a judgment criterion unaffected by distribution shifts.

Secondly, it introduces window-based time-domain masking strategies and amplitude-based frequency-domain masking strategies to eliminate potential outlier observations and patterns in sequences. Therefore, TFMAE is a model resistant to anomaly bias.

Thirdly, experiments on five real-world datasets and two synthetic datasets show that TFMAE improves detection performance and speed.

In conclusion, "Temporal-Frequency Masked Autoencoders for Time Series Anomaly Detection" is the first paper to use time-frequency masking for time-series anomaly detection, with practical implications for industries such as healthcare, manufacturing, finance, and more.

At this conference, Huawei's selected research topics include AI4DB, time-series databases, query optimization, machine learning algorithm training and inference within databases, etc. These technological achievements are a result of Huawei's long-term exploration and practice in cutting-edge database technologies, as well as collaborative efforts with top global academic institutions to address challenging issues in the database field. By deeply integrating production, academia, research, and application, Huawei incorporates cutting-edge innovative research into product technology, builds a healthy database industry ecosystem, and provides customers with innovative and competitive database products and services.

In the future, Huawei will continue to innovate and deepen its efforts in the database field, leading the industry to new heights.