Connecting to HBase from Erlang using Thrift

The key was to piece together steps from the following two pages:

Thrift API and Hbase.thrift file can be found here
http://wiki.apache.org/hadoop/Hbase/ThriftApi

Download the latest thrift*.tar.gz from http://thrift.apache.org/download/

sudo apt-get install libboost-dev
tar -zxvf thrift*.tar.gz
cd thrift*
./configure
make
cd compiler/cpp
./thrift -gen erl Hbase.thrift

Take all the files in the gen-erl directory and copy them to your application’s /src.
Copy the thrift erlang client files from thrift*/lib/erl to your application or copy/symlink to $ERL_LIB

Can connect using either approach:

{ok, TFactory} = thrift_socket_transport:new_transport_factory("localhost", 9090, []).
{ok, PFactory} = thrift_binary_protocol:new_protocol_factory(TFactory, []).
{ok, Protocol} = PFactory().
{ok, C0} = thrift_client:new(Protocol, hbase_thrift).

Or by using the utility, need to investigate the difference

{ok, C0} = thrift_client_util:new("localhost", 9090, hbase_thrift, []).

Basic CRUD commands

% Load records into the shell
rr(hbase_types).

% Get a list of tables
{C1, Tables} = thrift_client:call(C0, getTableNames, []).

% Create a table
{C2, _Result} = thrift_client:call(C1, createTable, ["test", [#columnDescriptor{name="test_col:"}]]).

% Insert a column value
% TODO: Investigate the attributes dictionary's purpose
{C3, _Result} = thrift_client:call(C2, mutateRow, ["test", "key1", [#mutation{isDelete=false,column="test_col:", value="wooo"}], dict:new()]).

% Delete
{C4, _Result} = thrift_client:call(C3, mutateRow, ["test", "key1", [#mutation{isDelete=true}], dict:new()]).

% Get data
% TODO: Investigate the attributes dictionary's purpose
thrift_client:call(C4, getRow, ["test", "key1", dict:new()]).

TODO: Research how to use connection pooling with thrift.

TODO: Document connecting to Cassandra using thrift, but all the hard work has already been done by Roberto at https://github.com/ostinelli/erlcassa

Pro JavaScript for Web Apps – crossroads

Apress has an awesome book that covers KnockoutJS: Pro JavaScript for Web Apps.

You will get stuck on Chapter 4, when using the latest latest version of crossroads, hasher and signals because setting the context of hasher to be crossroads doesn’t seem to work anymore:

     hasher.initialized.add(crossroads.parse, crossroads);
     hasher.changed.add(crossroads.parse, crossroads);
     hasher.init();

     crossroads.addRoute("select/{item}", function (item) {
           viewModel.selectedItem(item);
     });

The code works with the version of libraries included in the book’s downloadable source code, but if you do a latest git pull of all 3 libraries you need to change the code as per the example as given in the Hasher GitHub readme

crossroads.addRoute("select/{item}", function (item) {
  viewModel.selectedItem(item);
});
function handleChanges(newHash, oldHash) {
  crossroads.parse(newHash);
};
hasher.changed.add(handleChanges);
hasher.initialized.add(handleChanges);
hasher.init();

Also, the crossroads routes must be set up before the hasher.init(). Otherwize the hasher initialized callback will have no routes to execute.
Otherwise, the book is great! Bunch of people in the office want to borrow it once I am done with it:)

Installing Erlang R15B from source in Ubuntu Oneiric

Download and extract Erlang source code:

wget http://www.erlang.org/download/otp_src_R15B.tar.gz
tar xfvz otp_src_R15B.tar.gz

Install c compiler, make, git and other needed tools to compile just about anything C based in Ubuntu

sudo apt-get install build-essential git-core libwxgtk2.8-dev libgl1-mesa-dev libglu1-mesa-dev libpng3 wx-common default-jre default-jdk fop

Install Erlang build dependencies, this is a shortcut to not having to wonder what dependencies are needed to build Erlang from source

sudo apt-get build-dep erlang

Build and install erlang

./configure
make
make docs
sudo make install
sudo make install-docs

Sharding in ChicagoBoss

ChicagoBoss provides “vertical sharding” out of the box: each model can be stored in a different database or db_adapter.

Sample boss.config where models wiki and author are stored in MySQL and all other models will be using mock db_adapter:

[{boss, [
{applications, [cb_tutorial]},
{db_host, "localhost"},
{db_port, 1978},
{db_adapter, mock},
{log_dir, "log"},
{server, mochiweb},
{port, 8001},
{session_adapter, mock},
{session_key, "_boss_session"},
{session_exp_time, 525600},
{db_shards, [
[
{db_host, "localhost"},
{db_adapter, mysql},
{db_port, 3306},
{db_username, "root"},
{db_password, "password"},
{db_database, "wiki"},
{db_shard_id, first_shard},
{db_shard_models, [wiki, author]}
]
]}
]}].

Note: db_shard_id tuple is required for mysql db_adapter because of the way mysql db_adapter included with CB creates connection pools. Think of it as a connection pool name. DBIdentifier in source code.

Sharding examples are endless: you can persist models of click-stream data to Riak, store content of pages in PostgreSQL, store some logs in Archive type of MySQL database and so on, all based on one configuration file.

Fun with ChicagoBoss Models

I decided to create some examples for CB Model API documentation.

Let’s say we want to create a basic 1:Many entity relationship between a blog post and comments in our hypothetical blog software. 1 blog post can have 0 or many comments. Create a new empty ChicagoBoss application by running:

git clone git://github.com/evanmiller/ChicagoBoss.git
cd ChicagoBoss
make
make app PROJECT=blogy
cd ../blogy

Create a model for a blog post. Put this into src/model/post.erl

-module(post, [Id, PostTitle, PostText]).
-compile(export_all).
-has({comments, many}).

Each blog post can have many comments, CB requires that you add -has({comments, many}). to the module declaration. Note that the comment model name must end with s as the first element in the tuple (tag of the tuple).

Comment must belong to a post. We add a simple length of the PostId check into validation_tests/0. Put this into src/model/comment.erl

-module(comment, [Id, PostId, CommentAuthor, CommentText]).
-compile(export_all).
-belongs_to(post).

validation_tests() -> [{fun() -> length(PostId) > 0 end, "Comment must have a post."}].
(wildbill@f15)1> P1 = post:new(id, "Awesome first post", "ftw").
{post,id,"Awesome first post","ftw"}
(wildbill@f15)2> {ok, P1Saved} = P1:save().
{ok,{post,"post-1","Awesome first post","ftw"}}
(wildbill@f15)3> P1Saved:id().
"post-1"
(wildbill@f15)4> C1 = comment:new(id, P1Saved:id(), "Anonymous", "Comment text").
{comment,id,"post-1","Anonymous","Comment text"}

At this point, the shell has variable C1 representing a new comment that is associated with our first blog post.

(wildbill@f15)5> C1:belongs_to().
[{post,{post,"post-1","Awesome first post","ftw"}}]
(wildbill@f15)6> {ok, C1Saved} = C1:save().
{ok,{comment,"comment-2","post-1","Anonymous", "Comment text"}}
(wildbill@f15)7> C1Saved:belongs_to_names().
[post]

Running ChicagoBoss unit tests in your application

If you get the following error after you run make test in the source directory of your ChicagoBoss web application:

=INFO REPORT==== 8-Jan-2012::21:49:44 ===
Starting Boss in production mode....

=INFO REPORT==== 8-Jan-2012::21:49:44 ===
Starting master services on nonode@nohost
{"init terminating in do_boot",{{badmatch,{error,{"no such file or directory","wiki.app"}}},[{boss_web_test,bootstrap_test_env,2,[{file,"src/boss/boss_web_test.erl"},{line,16}]},{boss_web_test,run_tests,1,[{file,"src/boss/boss_web_test.erl"},{line,41}]},{init,start_it,1,[]},{init,start_em,1,[]}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
make: *** [test] Error 1

That means that you need to run make before running make test

Installing Erlang on Fedora

The following has been tested on Fedora 13 and 15:

yum install bison bison-devel ncurses ncurses-devel zlib zlib-devel openssl openssl-devel gnutls-devel gcc gcc-c++ wxBase.i686 wxGTK.i686 wxGTK-devel.i686 unixODBC.i686 unixODBC-devel.i686 fop
wget http://www.erlang.org/download/otp_src_R15B.tar.gz
tar xfvz otp_src_R15B.tar.gz
cd otp_src_R15B
./configure --with-ssl
make
su -c 'make install'
make docs
su -c 'make install-docs'

ChicagoBoss example application: wiki

Update to use ChicagoBoss 0.8 on May 10, 2013.

Code is available on GitHub.

Download and build the latest ChicagoBoss source code:

git clone https://github.com/evanmiller/ChicagoBoss.git
cd ChicagoBoss
make

Make sure that ChicagoBoss successfully builds. Otherwise, you might need to download and install the latest version of Erlang and all it’s dependencies. To ask questions about ChicagoBoss, join the growing ChicagoBoss community on Google Groups.

Create an empty example wiki application by executing the following commands inside ChicagoBoss directory:

make app PROJECT=example_wiki
cd ../example_wiki/

Hello World
Let’s start with the “Hello World” smoke test and build from there. All shell commands are assumed to be executed from within the example_wiki directory.

Create example_wiki_pages_controller.erl file in src/controller directory

-module(example_wiki_pages_controller, [Req]).
-compile(export_all).

%% @doc show a "Hello World" message
index('GET', []) ->
	{output, "Hello World"}.

Start ChicagoBoss in development mode:

./init-dev.sh

and navigate to URL: http://localhost:8001/pages/index to see Hello World rendered in your browser.
The naming convention for ChicagoBoss controllers is: [OTP_application_name]_[controller_name]_controller.erl inside of src/controller directory.

Data Model
In this example, all the wiki data will be stored in MySQL database.
Modify db_host, db_port, db_adapter and add db_username, db_password, db_database in boss.config to result in:

[{boss, [
    {applications, [example_wiki]},
    {db_host, "localhost"},
    {db_port, 3306},
    {db_adapter, mock},
    {db_username, "root"},
    {db_password, "password"},
    {db_database, "example_wiki"},
    {log_dir, "log"},
    {server, mochiweb},
    {port, 8001},
    {session_adapter, mock},
    {session_key, "_boss_session"},
    {session_exp_time, 525600},
    {path, "../ChicagoBoss"},
    {vm_cookie, "my_secret_cookie"}, % Optional, defaults to abc123
    {websocket, true}
]},

{ tinymq, [
%% max_age- Maximum age of messages in the [message queue], in
%%   seconds. Defaults to 60.
    % {max_age, 60}
]},

{lager, [
    {handlers, [
      {lager_console_backend, info},
      {lager_file_backend, [
        {"log/error.log", error, 10485760, "$D0", 5},
        {"log/console.log", info, 10485760, "$D0", 5}
      ]}
    ]}
  ]},

{ example_wiki, [
    {base_url, "/"},
    {path, "../example_wiki"}
]}
].

Naturally, db_* tuple parameters will vary depending on your specific deployment of MySQL.
For more information on setting up database connectivity, see README_DATABASE.

Note: The "id" field should be a serial integer in README_DATABASE means that id column in that table is SERIAL which is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE

-- Table structure for table pages
CREATE TABLE IF NOT EXISTS `pages` (
  `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
  `page_title` varchar(32) NOT NULL,
  `page_text` varchar(1024) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ;

To create a model for the pages table, create page.erl file in src/model directory.
Place the following into src/model/page.erl:

-module(page, [Id, PageTitle, PageText]).
-compile(export_all).

validation_tests() ->
	[{fun() -> length(PageTitle) > 0 end, "Page Title cannot be empty."},
	 {fun() -> length(PageTitle) =< 32 end, "Page Text cannot be more than 32 characters long."}].

Visit http://localhost:8001/doc/page to verify that the page model is available.
Let’s test adding content to MySQL database from the development console. After you start the server with ./start-dev.sh you find yourself in an erlang shell. Enter the following to create a new row in the pages table:

FirstPage = page:new(id, "First Page", "Page Content").
FirstPage:save().

“First Page” corresponds to the PageTitle parameter of the page.erl module, while “Page Content” corresponds to the PageText parameter of the page.erl module.
To load the saved page into a new variable, we use boss_db api such as:

[FoundPage] = boss_db:find(page, [{page_title, 'equals', "First Page"}], 1).

The URLs in the wiki are going to be:

  • /pages/index displays the list of all wiki pages
  • /pages/view/page-id displays a wiki page
  • /pages/create allows to add a page
  • /pages/edit/page-id allows to edit a page

This is a basic application example, and all the business logic will reside inside of one controller.

Let’s go back and modify the pages controller, index function, to return a list of pages in our wiki.

Edit src/controller/example_wiki_pages_controller.erl:

-module(example_wiki_pages_controller, [Req]).
-compile(export_all).

%% @doc show a list of all wiki pages out there
index('GET', []) ->
	Pages = boss_db:find(page, []),
	{ok, [{pages, Pages}]}.

[Req] parameter in the module declration above creates a Req variable which is a simple_bridge request object with all the functions that it’s API has at https://github.com/nitrogen/simple_bridge.

When a controller’s function returns a tuple in the {ok, []} format, ChicagoBoss pushes this data into a view which is developed using ErlyDTL.

Views are created as src/view/[controller_name]/[function_name].erl files.

Inside the src/views directory create pages directory and add the following index.html file:

<html>
<head>
<title>{% block title %}Wiki pages{% endblock %}</title>
</head>
<body>
{% block body %}
<h1>My Wiki Pages:</h1>
<ul>
{% if pages %}
  {% for page in pages %}
    <li><a href="view/{{page.id}}">{{ page.page_title }}</a></li>
  {% endfor %}
{% else %}
  <li>No Wiki Pages</li>
{% endif %}
</ul>
{% endblock %}
</form>
</body>
</html>

After you save index.html and browse to http://localhost:8001/pages, you will get an HTML page with a list of page titles or “No Wiki Pages” depending on if you manually added pages to the MySQL pages table.

When you browse to http://localhost:8001/pages/index or http://localhost:8001/pages, a GET request gets sent to the server by the browser and ChicagoBoss routes the request to index function inside the example_wiki_pages_controller.

Line Pages = boss_db:find(page, []) selects all rows from pages table, assigns it to Pages variable and the controller’s function returns the results packed inside a tuple to the index.html view.

The returned data of controller’s functions can be different tuples:

  • {output, <<"Raw Output">>} will bypass views and send back raw data as is to the browser
  • {json, [{pages, Pages}]} will bypass views and send back json encoded data. This functionality makes it extremelly easy to build REST-based APIs
  • {ok, [{pages, Pages}, {stuff, EvenMoreContent}]} will send the tuples inside of the list to the view
  • More example of return values

Just for kicks, change the return result of example_wiki_pages_controller:index/2 to be {json, [{pages, Pages}]} instead of {ok, [{pages, Pages}]} and load the URL in your browser: http://localhost:8001/pages to see JSON representation of all pages inside of the MySQL database.

We need to be able to create, view and edit wiki pages using the browser. Let’s edit src/controller/example_wiki_pages_controller.erl to allow this functionality.

When we want to see a wiki page, the page’s id will be appended to the /pages/view/ URL. View function will look-up the page in the database, if the page is found, page_text() function will be invoked on the object to get the page’s content. When new pages are created, it is possible to link from one page to another by adding [page-id] markup to it’s content, and the hackish/ugly/smelly code inside of the view function replaces [page-id] format into hrefs to the page and sends the data to the view.html view.

%% @doc display a specific wiki page
view('GET', [Id]) ->
	case boss_db:find(Id) of
		{error, Reason} -> {redirect, [{action, "create"}]}; %% TODO: Redirect to error page
		undefined -> {redirect, [{action, "create"}]}; % When you visit /view/NotExistentPage the requested Page doesn't exist, we redirect the client to the edit Page so the content may be created
		ExistingWikiPage -> 
			% Replace all [page-id] with links
			% TODO: There has to be a better way
			StartHrefs = re:replace(ExistingWikiPage:page_text(), "\[\w*-*[0-9]*", "<a href='/pages/view/&'>&", [global, {return, list}]),
			ClosedHrefs = re:replace(StartHrefs, "\]", "</a>", [global, {return, list}]),
			CleanedUp = re:replace(ClosedHrefs, <<"\[">>, "", [global, {return, list}]),
			{ok, [{page, ExistingWikiPage}, {cleaned, CleanedUp}]}
	end.

Accordingly, src/view/pages/view.html needs to be created to display the wiki page content:

{% extends "pages/index.html" %}
{% block title %}{{ page.page_title }}{% endblock %}
{% block body %}
<h1>{{ page.page_title }}</h1>
<div>{{ cleaned }}</div>
<ul>
<li><a href="{% url action="edit" %}/{{ page.id }}">Edit</a></li>
</ul>
{% endblock %}

When creating links in view erl files, keep in mind that links are relative to two different directories.

Note about relative links in view files

View files HTML references such as img, script are relative to the /priv directory:

<img src="/static/pics/something.png" />
<!-- The server will send something.png from <project>/priv/static/pics/something.png -->

ErlyDTL markup references, those inside {% %} tags, such as {% extends %} are relative to the /src/view directory:

{% extends "pages/index.html" %}
<!-- ErlyDTL will try to load the ErlyDTL artifact in <project>/src/view/pages/index.html  -->

When the wiki visitor wants to create a new page, URL /pages/create needs to be visited. The controller has the following function

%% @doc Handles rendering the new wiki page view which is empty by default
create('GET', []) -> ok;

which is a place holder that does nothing right now, and just makes ChicagoBoss send back the src/view/pages/create.html view data:

{% extends "pages/index.html" %}
{% block title %}A new Wiki Page{% endblock %}
{% block body %}
{% if errors %}
<ul>
  {% for error in errors %}
  <li>{{ error }}</li>
  {% endfor %}
</ul>
{% endif %}
<form method="post">
<h1>Create a new wiki page</h1>
<div>
<p>Title:</p><textarea name="page_title">{% if new_page %}{{ new_page.page_title | escape }}{% endif %}</textarea>
</div>
<div><p>Text:</p>
<textarea name="page_text">{% if new_page %}{{ new_page.page_text | escape }}{% endif %}</textarea>
</div>
<input type="submit"/>
</form>
{% endblock %}

Create.html view template contains a form that POSTs data to /pages/create URL and that means that our controller needs to be able to receive the POST content, have the model validate the data and insert the data of a new wiki page as a new row into the pages table. If the data validates and is inserted, we redirect the browser to the controller’s view() function with the new page id as a parameter and if there is an error, the create.html pages gets rendered again but this time with an error message and title and pages textarea populated to previous values.

%% @doc Handles POST data of form submission from new wiki page
create('POST', []) ->
	Title = Req:post_param("page_title"),
	Text = Req:post_param("page_text"),
	NewWikiPage = page:new(id, Title, Text),
	case NewWikiPage:save() of
		{ok, SavedWikiPage} -> 	{redirect, [{action, "view"}, {id, SavedWikiPage:id()}]}; 
		{error, ErrorList} -> {ok, [{errors, ErrorList}, {new_page, NewWikiPage}]}
	end.

The view.html template contains a link to be able to edit a wiki page. URL to edit a wiki page is in the /pages/edit/page-id format. edit(‘GET’, [Id]) function in the controller gets invoked. It looks up the page in the database and passes it’s data to the src/view/pages/edit.html template

	
%% @doc Fetch the existing wiki and show the edit page
edit('GET', [Id]) ->
	ExistingWikiPage = boss_db:find(Id),
	{ok, [{page, ExistingWikiPage}]};  

The src/view/pages/edit.html contains a form that allows the visitor to update a wiki page.

{% extends "pages/index.html" %}
{% block title %}Edit wiki page{% endblock %}</title>
{% block body %}
<h1>Edit your wiki page:</h1>
<p>To link between pages, place the destination page numerical id inside of square brackets.</p>
<p><i>Example:</i> <b>Please visit [page-number] to view more information.</b></p>
<form method="post" action="{% url action="edit" %}">
<input type="hidden" name="page_id" value="{% if page %}{{ page.id }}{% endif %}" />
<p>Title:</p><input type="text" name="page_title" value="{% if page %}{{ page.page_title }}{% endif %}" />
<p>Text:</p><textarea name="page_text">{% if page %}{{ page.page_text }}{% endif %}</textarea>
<div>
<input type="submit" value="Save"/>
</div>
</form>
{% if errors %}
<ul>
  {% for error in errors %}
  <li>{{ error }}</li>
  {% endfor %}
</ul>
{% endif %}
{% endblock %}

The example_wiki_pages_controller.erl controller handles POST data submission from the /pages/edit/page-id URL.

%% @doc Updates the wiki page from the Edit view's POST information	
edit('POST', []) ->
	Id = Req:post_param("page_id"),
	Title = Req:post_param("page_title"),
	Text = Req:post_param("page_text"),
	ExistingWikiPage = boss_db:find(Id),
	UpdatedWikiPage = ExistingWikiPage:set( [{page_text, Text}, {page_title, Title}] ),	
	case UpdatedWikiPage:save() of
		{ok, SavedWiki} -> 	{redirect, [{action, "view"}, {id, Id}]}; % Redirect to the updated page
		{error, ErrorList} -> {ok, [{errors, ErrorList}, {page, UpdatedWikiPage}]}
	end.

Example_wiki can be extended with the following functionality:

  • Admins only area to delete wiki pages to demonstrate sessions and cookies
  • Upload images functionality to demonstrate multipart/form-data functionality
  • Live stream of latest changes to demonstrate publish/subscribe functionality of BossNews and BossMQ
  • Unit testing to demonstrate boss_web_test