Database of Things 2 (Machbase Database)

DBMS For Processing IoT Data


The Problems With Handling Large Amounts Of Data In Real Time

As previously mentioned, more devices and more sensors mean you’ll start to generate incredible amounts of data. To compound the problem even further, continued storage of data for analysis and historical record means you’ll only ever have more data to deal with.
Typical database management systems (DBMS) are unsuitable and make it difficult to manage petabytes of information within a single system. Also, while conventional Big Data platforms are optimized for batch processing, distributed storage, and massive data retrieval, they do not necessarily make it possible to analyze large amounts of data in real time.
Since the utility of time series data is based on continuing to collect, store, index, and retrieve it without data loss, your DBMS must be scalable, and able to handle large amounts of sensor data in real time.
Specialized IoT DBMS make time series data management much easier by generating search indexes and processing statistical data for visualization very quickly.

Query Language And Interface

IoT sensor data can come as either regular structured data and semi-structured data. While SQL is the most common query language for structure data, there is no query language specifically used for semi-structured data. No-SQL query languages have emerged with the introduction of Big Data systems, but heterogeneous query languages (e.g. MongoDB) are not widely used or supported. As a result, SQL on Hadoop products such as Spark and Impala have become popular and encouraged increased usage of SQL. DBMS supporting the SQL language typically also support traditional interfaces, such as ODBC and JDBC.
Operating historian products use JSON based query interface through HTTP protocol through REST API. Since the REST API is very easy to use, nearly every environment has been adapted and it has become the most popular interface to support.

Transaction Processing

In a distributed data system that needs to be able to process more than 100 billion data per second in real time, it’s difficult to perform transaction processing. Transaction processing can be summarized by ACID or two-phase locking.
Traditional relational DBMS that complete ACID-based transaction processing are difficult to use due to the nature of time series data, since there is typically no data updating operation before data deletion. Typically, this data is being processed with eventual consistency techniques based on CAP theorem in Big Data platform.
In order to process massive amounts of IoT data in real-time, you require a DBMS with more efficient data processing techniques that reflect the characteristics of time series data, rather than than those used in traditional ACID-based transactions.

Statistical Processing For Time Series Data

Time-series data requires time-based statistical processing in order for it to be used for visualization or statistical analysis.
It can be difficult to generate statistical data in an environment where a large amount of data is input in real time. In fact, even general sum, count and average functions in time series statistical processing requires a special sampling function. Since it is difficult to process the time series statistical data with a traditional relational database, stream databases are being studied as a possible alternative.

Machbase DBMS Optimized For IoT Data

The DBMS for IoT data should be able to:
  • Process large amounts of data in real time,
  • Support a convenient and efficient query language,
  • Process effective transactions,
  • Process time series data statistics.

Machbase is the only available solution that meets all of these requirements.

Real-Time Processing Of Large Data Volumes

  • With distributed data storage and query structure, Machbase can input and index 200 million data from a single device. With addition of equipment, the performance improves and more than 10 million sensor data can be processed per second.
  • It has a dedicated API for high-speed data input and an index structure for high-speed index generation.
  • For increasing time-series data over time, Machbase allows users to add additional equipment to the cluster for better performance and space.

 

Efficient Query Language and Interface

  • Machbase supports an optimized SQL language for data processing. No-SQL products are starting to offer SQL language again.
  • Inverted index and related syntax are provided for efficiently retrieving semi-structured data, making it easy to search and process semi-structured data.
  • It also provides REST APIs as well as the SQL standard interface, ODBC/JDBC.

 

Efficient Transaction Processing

  • Optimal transaction technique for time series data
  • It does not provide any update but inserts and deletes are possible. Even when restarting by node fails, recovery process is performed and the consistency of data and index is maintained.
  • Machbase Enterprise Edition solves the data loss caused by node failures with a distributed data storage technique.

 

Time Series Statistics Processing

  • Automatic statistics function for time series sensor data: Automatically generate statistics for each sensor per time unit (second, minute, hour) and sensor identifier.
  • It supports extended query conditional clauses optimized for time series data.

Machbase is the solution that is implemented by considering all the functions and performance requirements for processing time series data and is suitable for processing IoT data.


“Machbase Database”


Do you want to try Machbase?
 
Contact the Machbase team with your questions!

No comments:

Powered by Blogger.