Introduction to Transformers for NLP

Chapter 1 introduces natural language generation and natural language understanding.

Chapters 3 and 4 show how tokenizer works and sentiment analysis using BERT via finiteautomata/bertweet-base-sentiment-analysis model

Lots of goodies in chapter 5:

  • Walk-thru on setting up Gradio on huggingface.co.
  • An example of a chatbot using microsoft/DialoGPT-medium.
  • Abstractive text summarization using google/pegasus-xsum
  • Zero shot learning is taking a pretrained model from huggingface that is trained on a certain dataset and use it for inference on examples it has never seen before
  • T5 = Text-to-Text Transfer Transformer

Chapter 6 is about fine-tuning pre-trained bert-base-cased model on imdb reviews to classify them as positive/negative

Source code for the book is at https://github.com/Apress/intro-transformers-nlp

SQL Server tuning

See a list of all the indexes in the database

select * from sys.dm_db_index_physical_stats(null,null,null,null,null)

Index fragmentation

SELECT DB_NAME() AS DatabaseName,
Object_name(i.object_id) AS TableName,
i.index_id,
name AS IndexName,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, 'Limited') AS ips
INNER JOIN sys.indexes AS i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
ORDER BY avg_fragmentation_in_percent desc

Show filegroups and db files

select * from sys.filegroups
select * from sys.database_files

Additional commands

SET SHOWPLAN_XML OFF
GO
SET STATISTICS PROFILE OFF
SET STATISTICS IO ON
-- clear query plans
dbcc freeproccache;
-- clear buffer cache
dbcc dropcleanbuffers;

Examples using AdventureWorks2016

use AdventureWorks2016;

select pp.name
from Production.Product pp
join Sales.SalesOrderDetail ss
on pp.ProductID = ss.ProductID
join Sales.SalesOrderHeader oh
on ss.SalesOrderID = oh.SalesOrderID;

select check_clause from INFORMATION_SCHEMA.CHECK_CONSTRAINTS
where CONSTRAINT_NAME = 'CK_Employee_SickLeaveHours';

SET STATISTICS PROFILE on

SELECT  sql_text.text, last_execution_time, creation_time
FROM    sys.dm_exec_query_stats AS stats
        CROSS APPLY sys.dm_exec_sql_text(stats.sql_handle) AS sql_text
order by last_execution_time DESC

SELECT * FROM sys.dm_exec_query_transformation_stats

EXEC sys.sp_updatestats @resample = '' -- char(8)

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION

	SELECT TOP 100 * FROM AdventureWorks2016CTP3.Person.Person-- WITH(TABLOCKX);

	SELECT resource_type, request_mode, COUNT(*) AS lock_count
	FROM sys.dm_tran_locks
	WHERE request_session_id = @@SPID
	GROUP BY resource_type, request_mode;

ROLLBACK

DBCC SHOW_STATISTICS ('Person.Person', 'IX_Person_LastName_FirstName_MiddleName')

SELECT * FROM sys.stats
WHERE OBJECT_ID = OBJECT_ID('Person.Person')
ORDER BY stats_id;

DBCC SHOW_STATISTICS ('Person.Person', 'PK_Person_BusinessEntityID')

DBCC SHOW_STATISTICS ('Person.Person', 'IX_Person_LastName_FirstName_MiddleName')

SELECT * FROM Person.Person WHERE LastName = 'Alonso';

SELECT * FROM Person.Person WHERE LastName = 'Acca';

See the query plans

SELECT 		databases.name,
	dm_exec_sql_text.text AS TSQL_Text,
	dm_exec_query_stats.creation_time, 
	dm_exec_query_stats.execution_count,
	dm_exec_query_stats.total_worker_time AS total_cpu_time,
	dm_exec_query_stats.total_elapsed_time, 
	dm_exec_query_stats.total_logical_reads, 
	dm_exec_query_stats.total_physical_reads, 
	dm_exec_query_plan.query_plan
FROM sys.dm_exec_query_stats 
CROSS APPLY sys.dm_exec_sql_text(dm_exec_query_stats.plan_handle)
CROSS APPLY sys.dm_exec_query_plan(dm_exec_query_stats.plan_handle)
INNER JOIN sys.databases
ON dm_exec_sql_text.dbid = databases.database_id
WHERE dm_exec_sql_text.text LIKE '%t_item_master%';

Practical Object-Oriented Design in Ruby

Object-oriented design (OOD) requires that you shift from thinking of the world as a collection of predefined procedures to modeling the world as a series of messages that pass between objects.

Object-oriented design is about managing dependencies. It is a set of coding techniques that arrange dependencies such that objects can tolerate change.

If lack of a feature will force you out of business today it doesn’t matter how much it will cost to deal with the code tomorrow; you must do the best you can in the time you have. Making this kind of design compromise is like borrowing time from the future and is known as taking on technical debt. This is a loan that will eventually need to be repaid, quite likely with interest.

Data Structures and Algorithms with JavaScript

Chapter 2. Arrays
Explanation of arrays and their manipulation. Array .sort() works lexigraphically. reduce(), map(), filter() functions.

Chapter 3. Lists
Enumerating a list. Nothing really interesting.

Chapter 4. Stacks
A last-in first-out (LIFO) list.

Chapter 5. Queues
A first-in first-out (FIFO) list. Radix sort

Chapter 10. Binary trees
Tree is made up of a set of nodes connected by edges. Binary trees restrict the number of child nodes to no more than two. Binary search trees (BST) are those in which data with lesser values are stored in left nodes and greater values in right nodes, provides for efficient searches. Finding minimum, maximum value and specific value. Removing nodes from BST.

Chapter 11. Graphs and Graph Algorithms
Introduction of vertices and edges. Adjacency list VS adjacency matrix for storing graph structure.
Visiting every vertex using depth-first and breadth-first. Finding the shortest path using both algorithms. Topological sorting of the graph.

Connecting to HBase from Erlang using Thrift

The key was to piece together steps from the following two pages:

Thrift API and Hbase.thrift file can be found here
http://wiki.apache.org/hadoop/Hbase/ThriftApi

Download the latest thrift*.tar.gz from http://thrift.apache.org/download/

sudo apt-get install libboost-dev
tar -zxvf thrift*.tar.gz
cd thrift*
./configure
make
cd compiler/cpp
./thrift -gen erl Hbase.thrift

Take all the files in the gen-erl directory and copy them to your application’s /src.
Copy the thrift erlang client files from thrift*/lib/erl to your application or copy/symlink to $ERL_LIB

Can connect using either approach:

{ok, TFactory} = thrift_socket_transport:new_transport_factory("localhost", 9090, []).
{ok, PFactory} = thrift_binary_protocol:new_protocol_factory(TFactory, []).
{ok, Protocol} = PFactory().
{ok, C0} = thrift_client:new(Protocol, hbase_thrift).

Or by using the utility, need to investigate the difference

{ok, C0} = thrift_client_util:new("localhost", 9090, hbase_thrift, []).

Basic CRUD commands

% Load records into the shell
rr(hbase_types).

% Get a list of tables
{C1, Tables} = thrift_client:call(C0, getTableNames, []).

% Create a table
{C2, _Result} = thrift_client:call(C1, createTable, ["test", [#columnDescriptor{name="test_col:"}]]).

% Insert a column value
% TODO: Investigate the attributes dictionary's purpose
{C3, _Result} = thrift_client:call(C2, mutateRow, ["test", "key1", [#mutation{isDelete=false,column="test_col:", value="wooo"}], dict:new()]).

% Delete
{C4, _Result} = thrift_client:call(C3, mutateRow, ["test", "key1", [#mutation{isDelete=true}], dict:new()]).

% Get data
% TODO: Investigate the attributes dictionary's purpose
thrift_client:call(C4, getRow, ["test", "key1", dict:new()]).

TODO: Research how to use connection pooling with thrift.

TODO: Document connecting to Cassandra using thrift, but all the hard work has already been done by Roberto at https://github.com/ostinelli/erlcassa

Pro JavaScript for Web Apps – crossroads

Apress has an awesome book that covers KnockoutJS: Pro JavaScript for Web Apps.

You will get stuck on Chapter 4, when using the latest latest version of crossroads, hasher and signals because setting the context of hasher to be crossroads doesn’t seem to work anymore:

     hasher.initialized.add(crossroads.parse, crossroads);
     hasher.changed.add(crossroads.parse, crossroads);
     hasher.init();

     crossroads.addRoute("select/{item}", function (item) {
           viewModel.selectedItem(item);
     });

The code works with the version of libraries included in the book’s downloadable source code, but if you do a latest git pull of all 3 libraries you need to change the code as per the example as given in the Hasher GitHub readme

crossroads.addRoute("select/{item}", function (item) {
  viewModel.selectedItem(item);
});
function handleChanges(newHash, oldHash) {
  crossroads.parse(newHash);
};
hasher.changed.add(handleChanges);
hasher.initialized.add(handleChanges);
hasher.init();

Also, the crossroads routes must be set up before the hasher.init(). Otherwize the hasher initialized callback will have no routes to execute.
Otherwise, the book is great! Bunch of people in the office want to borrow it once I am done with it:)

Installing Erlang R15B from source in Ubuntu Oneiric

Download and extract Erlang source code:

wget http://www.erlang.org/download/otp_src_R15B.tar.gz
tar xfvz otp_src_R15B.tar.gz

Install c compiler, make, git and other needed tools to compile just about anything C based in Ubuntu

sudo apt-get install build-essential git-core libwxgtk2.8-dev libgl1-mesa-dev libglu1-mesa-dev libpng3 wx-common default-jre default-jdk fop

Install Erlang build dependencies, this is a shortcut to not having to wonder what dependencies are needed to build Erlang from source

sudo apt-get build-dep erlang

Build and install erlang

./configure
make
make docs
sudo make install
sudo make install-docs