We have published a first alpha version of our Machine Learning Tool “akka-lift-ml” on github under the Apache 2.0 License. The tool picks up where a Data Scientist’s work ends and handles a large portion of the tasks involved in operating ML systems. This is often referred to as Data Engineering. akka-lift-ml is written in Scala and extends a local Spark instance for training results. The training itself can be performed on any Spark cluster reachable over the network.

For the tool to successfully execute machine learning tasks, the data must be completely cleaned. This can be accomplished in near real-time using Spark Streaming or akka Streaming (FastData Processing). Once the data has been stored in the correct format, for example CSV on HDFS or S3, training can begin immediately. The entire tool is controlled during operation via REST interfaces or actors and is delivered as a Docker container. For example, new training runs can be started via HTTP POST, new optimal parameters can be found, and results from past training runs can be accessed via HTTP GET. Should the microservice crash, the last trained model is automatically loaded from a source like S3 or HDFS.

Currently, the tool only supports the ALS (Alternating Least Squares) algorithm for Collaborative Filtering. This is very commonly used in recommender systems. Additional algorithms such as linear regression are planned to be added.

You are welcome to send us wishes, suggestions, and improvement proposals via github.

Further information and a QuickStart Guide can be found in the description or in the wiki system on github:

https://github.com/innFactory/akka-lift-ml

Related Articles

The Potential of Voice Control

Voice Control with Dialogflow (api.ai) in a React-Native App

Continuous Delivery with AWS CodePipeline