Category Archives: cassandra

Retrieving records from Apache Cassandra using NodeJS v0.10.35 and ExpressJS v4.0 via a REST interface

The adventures in Cassandra continue, and following on from the last post I’m going to show how to set up a REST interface to retrieve records from Apache Cassandra via Nodejs,

I’ll only set up the GET for a list of items and an individual item, the PUT, POST and DELETE actions I’ll leave to you.

NodeJS is a server based runtime environment for Javascript which uses Googles V8 javascript engine. It has proven to be an extremely fast way of pushing content and has been around for about 5 years now.

It is single threaded, sacrificing session and context specific overheads to maximise throughput.

As with Cassandra it can cope with a great number of requests a second (~ 500rps on 150 concurrent requests), obviously scaling up for processor, and as it is stateless, can be scaled up into round robin farms to cope at massive sizes. (see the Paypal and eBay metrics for more stat-candy).

To achieve a full-javascript stack, node is often used to drive javascript test-runners and compilation programs at the developer side, then used again, alongside a web server, such as ExpressJS to serve server side content back to a javascript application.

NodeJS runs on a variety of flavours of server  and desktop platform, but I’m going to skip the install of nodejs and try to keep this walkthrough as platform agnostic as possible. A common feature of all platforms is the NPM (Node Package Manager), and this is what we’ll be using to install the ExpressJS and Cassandra libraries we will need to serve the content.

Firstly, I’ll create a folder to run this application from, then I’ll open up a command prompt and browse to the folder. Then I install express by entering the following.

npm install express

The folder should now have a node_modules folder, which will contain the express server code and its dependencies.

We will also need to add  a package.json file so that node can verify the app on running.

{
  "name": "nd.neilhighley.com",
  "version": "1.0.0",
  "private": true,
  "scripts": {
    "start": "node ./bin/www"
  },
  "dependencies": {
    "express": "~4.11.1"
  }
}

Next, I create my application folder structure as follows;

>bin
>www
>public
>>javascripts
>routes
>views

Now I add a few javascript files to the following folders;

>www
www.js

adding the following code for the webserver

#!/usr/bin/env node

var app = require('../app');
var http = require('http');

var port = normalizePort(process.env.PORT || '8080');
app.set('port', port);
var server = http.createServer(app);
server.listen(port);
server.on('error', onError);
server.on('listening', onListening);

function normalizePort(val) {
 var port = parseInt(val, 10);

 if (isNaN(port)) {
 // named pipe
 return val;
 }

 if (port >= 0) {
 // port number
 return port;
 }

 return false;
}

function onError(error) {
 if (error.syscall !== 'listen') {
 throw error;
 }

 var bind = typeof port === 'string'
 ? 'Pipe ' + port
 : 'Port ' + port

 // handle specific listen errors with friendly messages
 switch (error.code) {
 case 'EACCES':
 console.error(bind + ' requires elevated privileges');
 process.exit(1);
 break;
 case 'EADDRINUSE':
 console.error(bind + ' is already in use');
 process.exit(1);
 break;
 default:
 throw error;
 }
}

function onListening() {
 var addr = server.address();
 var bind = typeof addr === 'string'
 ? 'pipe ' + addr
 : 'port ' + addr.port;
}

The code above simply sets up the webserver to listen on port 8080,  and sets our application main file as app.js in the folder root.

Before delving into app.js, I need to set up the routes, this will trap any calls to a particular Uri, and pass them to the appropriate codebase.  I’m going to name this after the route I will be using, this is not a requirement, but will help you in future! 😉

>routes
somedata.js
var express = require('express');
var router = express.Router();

/* GET data. */
router.get('/somedata/', function(req, res, next) {
 var jsonToSend={Message:"Here is my resource"};
 res.json(jsonToSend);
});

module.exports = router;

Now, I can set up my app.js file in the folder root to contain the main application logic.

var express = require('express');
var path = require('path');
var routes = require('./routes/somedata');
var app = express();

// view engine setup
//app.set('views', path.join(__dirname, 'views'));
//app.set('view engine', 'jade');

app.use(express.static(path.join(__dirname, 'public')));
app.use('/', routes);

// catch 404 and forward to error handler
app.use(function(req, res, next) {
 var err = new Error('Not Found');
 err.status = 404;
 next(err);
});

app.use(function(err, req, res, next) {
 res.status(err.status || 500);
 res.render('error', {
 message: err.message,
 error: err
 });
});

module.exports = app;

Lets do a quick test by starting the application

npm start

Then browsing to

http://localhost:8080/somedata

If all goes well, we will receive a JSON file as a response.

Now I can alter the routes file to mirror a typical REST interface.

Replace the previous single GET function above with the following

/* GET data. */
router.get('/somedata/', function(req, res, next) {
 var jsonToSend={Message:"Here is my resource"};
 res.json(jsonToSend);
});

router.get('/somedata/:item_id', function(req, res, next) {
 var jsonToSend={Message:"Here is my resource of item "+req.params.item_id};
 res.json(jsonToSend);
});

Confirm the above by calling the following urls and checking the content;

http://localhost:8080/somedata/
http://localhost:8080/somedata/12

The second url should return with the message which has the item_id in it.

Now I have confirmed the framework running, I’ll connect to cassandra and retrieve the records. To do that we just need to replace the jsonToSend with the cassandra response.

So, install the cassandra node client, as follows, in the root of your application.

npm install cassandra-driver

Then to update the package.json, add the cassandra-driver, as installed, so package.json now looks like;

{
 "name": "nd.neilhighley.com",
 "version": "1.0.0",
 "private": true,
 "scripts": {
 "start": "node ./bin/www"
 },
 "dependencies": {
 "express": "~4.11.1",
 "cassandra-driver":"~1.0.2"
 }
}

I’ll use the same keyspace as the last blog post, so have a look there for details on setting up cassandra.

casswcfexample

Update the somedata.json file to the following

var express = require('express');
var router = express.Router();

var cassandra = require('cassandra-driver');
var async = require('async');

var client = new cassandra.Client({contactPoints: ['127.0.0.1'], keyspace: 'casswcfexample'});

function GetRecordsFromDatabase(callback) {
 client.execute("select tid,description,title from exampletable", function (err, result) {
 if (!err){
 if ( result.rows.length > 0 ) {
 var records = result.rows[0];
 callback(1,result.rows);
 } else {
 callback(1,{});
 }
 }else{
 callback(0,{});
 }

 });
 }

function GetRecords(res){
 var callback=function(status,recs){
 if(status!=1){
 res.json({Error:"Error"});
 }else{
 var jsonToSend={Results:recs};
 res.json(jsonToSend);
 }
 };

 GetRecordsFromDatabase(callback);
}

/* GET data. */
router.get('/somedata/', function(req, res, next) {
 GetRecords(res);
});

router.get('/somedata/:item_id', function(req, res, next) {
 var jsonToSend={Message:"Here is my resource from "+req.params.item_id};
 res.json(jsonToSend);
});

module.exports = router;

Now when we call the following url

http://localhost:8080/somedata/

We should receive the following (or similar).

{"Results":[{"tid":1,"description":"description","title":"first title"},{"tid":2,"description":"another description","title":"second title"}]}

If you get an error, it may mean that async and long may need to be installed in your application root also. Install them with the npm.

We can format the JSON returned by altering  the GetRecords function.

The GetRecord function will be almost identical to the GetRecords function, just passing in the Id and changing the CQL.

function GetRecordFromDatabase(passed_item_id,callback) {
 client.execute("select tid,description,title from exampletable where tid="+passed_item_id, function (err, result) {
 if (!err){
 if ( result.rows.length > 0 ) {
 var record = result.rows[0];
 callback(1,record);
 } else {
 callback(1,{});
 }
 }else{
 callback(0,{});
 }

 });
 }

function GetRecord(r_item_id,res){
 var callback=function(status,recs){
 if(status!=1){
 res.json({Error:"Error"});
 }else{
 var jsonToSend={Results:recs};
 res.json(jsonToSend);
 }
 };

 GetRecordFromDatabase(r_item_id,callback);
}

There are a few things that can be done to the above code, including;

– Place all CRUD operations in standard library
– Have middleware fulfill requests to enable client side error catching and cleaner implementation

Hope you have enjoyed the example above. Apologies for any typos, etc. 🙂

Until next time.

 

Connecting Apache Cassandra v2.011 to Windows Communication Foundation

Cassandra is the current de facto standard in flat, scalable, cloud based database applications and as such usually only exists in AWS or linux clusters.
Fortunately, you can install Cassandra on windows in a single node configuration. For more info on the configurations available in a distributed install, see Planet Cassandra.
Previously I have worked on SOLR, which struck me as a similar construct to the node structure of Cassandra, specifically with the lack of master slave that you would normally have found in traditional clusters.

By avoiding a master slave format, Cassandra allows an elastic structure to cover the database records, akin to a RAID striping,  and does a great job when you have geo-spaced clusters allowing for quick responses from the database at a global scale.

Cassandra has several connectors,  covering all the major server side languages, including Javascript(nodejs), Java, Python, Ruby, C#.

Here I will show you how to connect a C# WCF service to Cassandra to retrieve records.

Firstly, install Cassandra, taking special care with prerequisites.
Go into the cassandra query language shell(CQL, and it probably has a python icon on it)..

Create a new keyspace, which allows you to add tables.

create keyspace casswcfexample with replication={'class':'SimpleStrategy', 'replication_factor':1};

Add a new table to your keyspace.

use casswcfexample;
create table exampletable(tid int primary key, title varchar, description varchar);

You can press enter for a new line in CQL, and it will continue, and won’t submit the statement until you add a semicolon

Add data to your keyspace’s exampletable

insert into exampletable(tid, title, description) values(1,'first title', 'description');
insert into exampletable(tid, title, description) values(2,'second title', 'another description');

Verify the data in your keyspace:table

select * from exampletable;

OK, now we have a running cassandra instance with data in it, let’s hook it up to a new, empty, WCF project.

 

As a rule, I always have my data layer as a separate project, to allow me to reuse it.
Create a new Solution, and add a WCF Application, leave it named as WCFService1.
Create a new Class library project and , call it CassLibrary.

In the CassLibrary project, import the DataStax nuget package for the cassandra connector.

Create an object to hold our result rows

namespace CassLibrary
{
 public class DTOExampleTable
 {
 public int Id { get; set; }
 public string Title { get; set; }
 public string Description { get; set; }
 }
}

Create an Extension to convert row to object

namespace CassLibrary
{
 public static class DataExtensions
 {
 public static IEnumerable<DTOExampleTable> ToExampleTables(this RowSet rows)
 {
 return rows.Select(r => new DTOExampleTable()
 {
     Id = int.Parse(r["tid"].ToString()),
     Title = r["title"].ToString(),
     Description = r["description"].ToString()
 });
 } 

 }
}

Create a Simple Data class to hold the session connection and the retrieval of our data

namespace CassLibrary
{
 public class Data
 {
 private string _ks;
 private string _cp;

 public Data(string contactPoint, string keyspace)
 {
 _cp = contactPoint;
 _ks = keyspace;
 }

 private ISession GetSession()
 {
 Cluster cluster = Cluster.Builder().AddContactPoint(_cp).Build();
 ISession session = cluster.Connect(_ks);
 return session;
 }

 public IEnumerable<DTOExampleTable> GetExampleRows()
 {
 var s = GetSession();
 RowSet result = s.Execute("select * from exampletable");
 return result.ToExampleTables();
 }

 }
}

This is just a simple way of showing the data access. The cassandra plugin has a representation of LINQ also, but I’ll not go into it here.

Now, go back to the WCFService1 project, and add a new method to the Service Contract interface.

[ServiceContract]
 public interface IService1
 {
 [OperationContract]
 IEnumerable<DTOExampleTable> GetExampleData();
 }

And in the actual service, add the corresponding method. We’ll add some test data, then verify the service is set up correctly first.

 public class Service1 : IService1
 {
 public IEnumerable<DTOExampleTable> GetExampleData()
 {
 var dto = new List<DTOExampleTable>();

 dto = new List<DTOExampleTable>()
 {
 new DTOExampleTable()
 {
 Id = 999,
 Title = "Just a test for WCF",
 Description = "WCF Desc test"
 },
 new DTOExampleTable()
 {
 Id = 99,
 Title = "Just another test for WCF",
 Description = "WCF Desc test 2"
 },

 };

 return dto;

 } 
 }

Right click the WCF project and start debugging

If the WCFTestClient doesn’t fire up, go to the Visual studio command prompt and type in

WCFTestClient.exe

Then attach your service URL to the client.

Once attached, select the GetExampleData Method in your service and invoke. You should see the two test items we added above.

Now that we know the service is fine, remove the test items from the service method, replacing them with the Data class in your CassLibrary.

 public class Service1 : IService1
 {
 public IEnumerable<DTOExampleTable> GetExampleData()
 {
 return new Data("127.0.0.1", "casswcfexample").GetExampleRows();
 } 
 }

Rebuild your service, and re-invoke it through WCFTestClient. You should see your Cassandra data.

The Cassandra client has DBContext and LINQ built in, so the example above should in no way be utilised in a real-life programming situation.

If i get a few hits on this page, I’ll post a more complete example, showing off the actual speed of Cassandra (a large cluster can serve hundreds of thousand of records a second). Cassandras only real competitor is MongoDB, which seemingly tops out at 20 nodes in a cluster.
e.g.

Its this throughput and stability in scaling that enabled Cassandra to be used in the Large Hadron Collider to store data as it came into one of the detectors.

Planet Cassandra : Companies Using Cassandra