Main.medium
Robert Mustarde
Mar 10 2017

Galactic Exchange Announces DataEnchilada™ – Industry’s First Big Data Ingestion Tool Powered by Artificial Intelligence

Pioneer in Big Data simplification adds Deep Learning algorithms to ClusterGX™ for automated data ingestion, classification and stream anomaly detection

Galactic Exchange today announced DataEnchilada™, the next phase in its vision of building a Zero-Experience-Needed (ZEN) Architecture for successfully deploying big data pipelines. DataEnchilada™ is an AI-powered tool designed specifically to simplify and automate the ingestion and classification of data into its flagship enterprise big data platform, ClusterGX™.

ClusterGX™ makes it incredibly easy to deploy Spark/Hadoop clusters – either on premise, on AWS or as a hybrid – without any previous big data experience. After the initial cluster deployment the process of filling the data lake begins. This requires identifying the relevant data sources and ingesting the data – both historic and on-going real-time streams.

“As we have engaged in conversations about our vision for simplification, the complexity surrounding data ingestion has been a recurring theme” said Rob Mustarde, CEO of Galactic Exchange. “Often people know where the data is – but getting it into the cluster and keeping it fresh is just plain difficult, and projects can fail as a result”.

DataEnchilada™ simplifies data pipeline building by automating the process of connecting to and ingesting data from common sources. Using an intuitive wizard driven UI, common on-premise and cloud data sources such as Oracle, MySQL, Twitter, Linkedin and MixPanel can easily be configured for both historic and real-time data ingestion into a ClusterGX™ data lake.


Deep Learning Powers Auto Data Classification

DataEnchilada™ automatically ingests data into a Kafka cluster running inside ClusterGX™. It is at this point that the embedded deep learning of DataEnchilada™ kicks in. Instead of having to manually configure data ingestion parameters, DataEnchilada™ will automatically classify the incoming data into discrete Kafka topics which are then maintained in long term persistent storage as well as being made available for real-time analysis.

“Now you can deploy Spark/Kafka clusters anywhere you want them in minutes, easily ingest and auto-classify your data using deep learning algorithms and launch applications with a single click from the embedded AppHub™” continued Mustarde. “We are essentially facilitating end-to-end data pipelines that can be created 10 to 100x faster than was previously possible, by people with limited to zero big data experience, and all without writing a single line of code”.


Data Watching Data : Artificial Intelligence Delivers Real-Time Data Anomaly Detection

As well as using deep learning algorithms to auto-classify data, DataEnchilada™ uses artificial intelligence to profile each Kafka topic created and quickly learn an expected pattern of stream “activity”. If any stream then departs from the expected activity profile, a sub-topic is created to capture that stream anomaly and alerts are sent to the ClusterGX™ administrator. Stream anomalies could be generated for a variety of reasons – including fraudulent activity, malware, viruses, system overloads or even simple data errors.

ClusterGX™ applies the same deep learning logic to monitoring the Docker container virtualization across the cluster. Every application and process running across ClusterGX™ is deployed within its own Docker container. Deep learning is used to create activity profiles for each container. Certain applications running within a container may reach out to the internet from time to time to gather updates or other data. Despite the best security measures, it is always possible for applications and services to be compromised through such activity. Using deep learning, any change in expected container activity – potentially through malicious activity – can be identified quickly and alerts raised and containers automatically placed into quarantine.

“Pretty much without exception, the traditional Hadoop and Hadoop-as-a-Service vendors focus on delivering a vertical stack of open source big data software tools and deliver service around that” said Robin Bloor, Chief Analyst at The Bloor Group. “By contrast, Galactic Exchange is delivering a data pipeline solution – a horizontal integration of cluster deployment, data ingestion and application launch – all orchestrated in a way to abstract a huge amount of the typical complexity associated with big data projects. Galactic is using machine learning and artificial intelligence to turn Big Data clusters designed for the Fortune 1000 into Smart Data clusters designed for pretty much anyone”.


Availability

ClusterGX™ Standard Edition for deployment on-premise or on Amazon AWS is available for FREE today with unlimited cluster scaling. Standard Edition includes one free DataEnchilada™ data source but for a limited period all available data sources will be made available for free. After the free period, additional data sources can be added by upgrading to the Premium or Enterprise Edition. Please visit the Galactic Exchange web site at www.galacticexchange.io or contact us for more details.

For more information, please contact:

Rob Mustarde:

rob@galacticexchange.io / Tel: (+1) 650 353 6940

About Galactic Exchange:
Galactic Exchange, Inc. develops software to enable the industry’s easiest and fastest possible deployment and management of powerful container-enabled compute clusters for Big Data. The ClusterGXTM platform can be deployed in minutes on-premise or in the cloud, with zero required big data or clustering experience. Free software can be downloaded today from the Galactic Exchange web site by registering at www.galacticexchange.io.