2012-11-01

Getting parent Document and Window from random DOM element.

2012-10-27

Internet Explorer Browser Helper Object (BHO) - the easy way

2012-10-02

Backbone.js events chart

2012-08-13

Markdown with blogspot.com and blogger.com

JavaScript prototype explained AGAIN

2011-12-27

RDF Triples from SQL


Hey SQL database, I want my Social Network now, please!

Without much ado, here are the answers:

Oracle, MS SQL Server users: UNPIVOT

PostgreSQL users: unnest()

MySQL users: it looks like you are in the cold and, frankly, I don't feel sorry for you. You knew what you signed up for from the start when you allowed your teenage nephew - PHP scripter and 'database-architect' - to make infrastructure choices for you. You can play with a stack of 'union select', but expecting that your entire data base now is likely one gargantuan table, I would expect 'union select' to be a huge performance hit. Look at http://triplify.org/Overview ... maybe...

But, let's backtrack a bit...


Why Make Traditional SQL Database Speak RDF triples to you?

So you have a NON-tree of data... Let's call it "Fee Schedule And Friends."  You want to pass it to web-based UI for viewing and alterations. 

Some of the fees are 'variable' (stated as decimal or percentage rate, to be applied against the total amount), others are 'fixed' (stated in USD per chargeable unit). You have a large pile of possible fees, but not all of your clients have all of the fees connected. For ease of presentation and management, the fees belong to groups like "Commonly Used", "Debit-related". "Credit-Related" etc. Groups may overlap.

You want your control interface to manage existent fees and, when needed, add more to customer from the pool of possible but applicable fees. Along with the list of fees you also pass the rendering hints for the fee types. Variable fees get '%' as a label and need to be drawn after the form field. Fixed get the currency symbol and it's accompanied by the rule of where to draw the symbol. Some fees have labels like "Per Month" that you need to attach dynamically to form fields.

Enough complexity? How do you pass that hive of data to the UI? Make it AJAX data out in separate, specially-formatted-data calls? Does your backend do multiple calls to DB to get that data out? You can't possibly pull this data in a single, cohesive, manageable result set as the data is multidimensional.

Yet, if you make your regular, garden-variety SQL server return an avalanche of related RDF-style triples, you are set. One call to DB -> one pass through serialization on web server -> one AJAX call from browser, yet browser gets the complete multidimensional hive. You can massage it in JavaScript and render it any way you want.

With the UNPIVOT SQL approach, you don't need to go all in on RDF and commission a DB overhaul. Throw together some stored procs and you are rolling RDF like a pro.


SQL UNPIVOT explained

Recap: What we like is a pile of RDF triples. RDF triple is a set of 3 data points ("Subject A has Property B which has a Value of C") you would get from an intersection of row and column in a traditional database table.

If you database is normalized to 4N form, chances are with some of you data you can do straight SELECTs and get your RDF triples, as the Predicate (relationship description) will be in the record itself, not in column name.  Example of 4N form table:

/* -- Fees table
ID | PropertyID | Value
13 | 'FeeType_Instance_ID123' | $12.99
56 | 'FeeType_Instance_ID123' | $17.89
*/
SELECT * FROM fees
-- and you get your RDF triples.

However, most of your data is still likely in tabular format (Property name is the column name). There the column name needs to be injected into the returned records.

/* -- Fees table
ID | SaleTransactionFee
13 | $12.99
56 | $17.89
*/

Here, on SELECT, "SaleTransactionFee" label (or it's 'universal' placeholder) needs to go into result set, to get something like:

ID | Predicate | Value
13 | 'Fees.FeeType.SaleTransactionFee' | $12.99
56 | 'Fees.FeeType.SaleTransactionFee' | $17.89

And that's where UNPIVOT comes in:

-- MS SQL Server syntax
select ID, Predicate, Value
from (
    select
        'Fees.FeeID.' + ID as ID,
        SaleTransactionFee as [Fees.FeeType.SaleTransactionFee]
    from fees
    -- insert your 'where' condition here
    ) as original
unpivot (
        Value FOR Predicate IN ([Fees.FeeType.SaleTransactionFee])
    ) as triples

Obviously, when you have more than one column in that table, you just add it to the unpivot:

select ID, Predicate, Value
from (
    select
        'Fees.FeeID.' + ID as ID
        , SaleTransactionFee as [Fees.FeeType.SaleTransactionFee]
        , RefundTransactionFee as [Fees.FeeType.RefundTransactionFee]
    from fees
    -- insert your 'where' condition here
) as original
unpivot (
    Value FOR Predicate IN (
        [Fees.FeeType.SaleTransactionFee]
        , [Fees.FeeType.RefundTransactionFee]
    )
) as triples

And get something like this:

ID | Predicate | Value
13 | 'Fees.FeeType.SaleTransactionFee' | $12.99
56 | 'Fees.FeeType.SaleTransactionFee' | $17.89
14 | 'Fees.FeeType.RefundTransactionFee' | $3.59

Presto! Infrastructure-less SQL-DB-table-to-RDF-triple generator. (When you heard me say "infrastructure-less," did you suddenly start smelling 'savings'? :) )

Roll this in into a stored procedure and you have yourself an RDF triple-spewing "getter" for the table. Group several of these into union-select and you have single-call getter for a number of related tables.


Hold the music!

One caveat comes to mind. All values must be cast into same SQL type. What this means is that you will need to cast everything to (n)varchar(max) - SQL's universal container for data. This may sound bad, but there is silver lining, as you start to cast numbers to strings early, guarding yourself from JavaScript's floating point calculation errors. It's probably most reliable to keep all your decimal data in strings and parse it in the browser only when needed.


The best for last.

Next post on RDF will be "Consuming rRDFj in the browser." 

2011-12-25

An RDF by any other name is just a normalized data set


You Did Great When You Started Reading This Because...

you will learn about a Golden Data Unit today. :) When it comes to passing around a pile of much disorganized data (would 'multidimensional hierarchy' sound more exciting?), there is nothing better than a pile of RDF triples. ... Well, I lied. There is something better than RDF triples. It's RDF triples expressed as JSON.


( If you wonder "What's RDF triple?" check out this fun 2-minute (starter) presentation on RDF triples and networks they form. )

The obvious thing you would expect to read by now is "Hey! Use 'such and such' spec for RDF to JSON serialization and you are golden!" right? Yeah... that's where the state of things now is just muddy. This post is an attempt to wash away some mud.


"Make Me Care About RDF triples, I Dare You!"

I don't need to make you care about them. You already do. You just call the triple sets by other names: Semantic Web, Social Networks, Neural Networks, Linked Data, Database Normalization etc.

You know, it's like that time when you drew these stunningly pleasing spirals, rectangles, other "pretty" shapes and congratulated yourself for being so ingenious and artistically inclined, but then learned that it was just a crude application of the Golden Ratio.

RDF triple is a Golden Data Unit for your data sets - that perfect data collection building block you always sensed existed but did not quite know how to describe. Whenever you reduce a data-set to chunks like these: "A has a property B that has a value of C" you describe an RDF triple (Subject, Predicate, Object).


"Nice, nice! What's that 'RDF as JSON' thing about?"

RDF can be expressed in XML, Turtle etc. Do you like consuming your data in XML, XPath, verbose DOM-inspired APIs? Itching to learn SPARQL, or some other new form of quasi English? 

My personal view on consuming RDF data through XML format can be described this this simile.It's crude, but gets the point across clearly:
"I am sure you can probably find a way to brush your teeth by pushing the brush through the rear end. But why would you want to?"

"So. What flavor of JSON format for my RDF triples?"

Through this link (also mentioned above), you already saw the "established" choices for RDF>JSON serialization formats.  Some of the formats exemplified smell very much like my simile above. Others are markedly close to "yes! I would love to use that!", but missed a turn somewhere on the way to perfection. I don't see an "iPhone app" (simple, beautiful, functional enough for majority of people) on that list of format choices. So, yes, this is a proposal for a new, yet another RDF as JSON format...


"The Golden Data Unit"

This is the promised Golden Data Unit:

{ "Subject" : { "Predicate" : Object }}

or

{ "ID" : { "PropertyName" : Value }}

ID is a string and can be classic RDF IRI, Universally Unique ID, some "readable" GUID or just plain GUID. Yet, it does not have to be "globally unique." It MUST be unique to the data set (Since this is a Hash (Dictionary, JavaScript Object), by definition,  there cannot be another object with same ID in the Hash.) Anything that you deem to be unique in the scope of the data set can be the 'ID.' Examples: "00000000-0000-0000-000000000001", "OurCorp_DB74_TableOne_KeyColumnName_ID12345",  "US_SSN_123-45-6789"

PropertyName is whatever String you want. Can be complex, with namespace prefixes, but any string will do. Examples: "foaf:friend", "friend", "Property#123"

Value is the only radical part of the proposal. IT IS THE VALUE (Object) in the triple. It is not a hive of type definitions nested in hardcoded properties.

Value, by default, is assumed to be a LIST OF VALUES all matching this same Predicate. (It would not work otherwise since Hash can have only one entry for a given Predicate string, so multi-valued predicates must be coerced into a key-to-list.) However, single-entry lists can be expressed just as one value without Array wrapping around it.

Units of value could be whatever the JavaScript types that can be passed through JSON. However, RDF format does not dive into value format. I would expect the values to be strings, ints, bools. Nothing stops you from having an Array of objects (Array of Arrays or Array of Hashes) in application-specific format attached as Value. The format deeper that Predicate-Value pair is just not prescribed by this RDF spec.

When parsing Values, see if Value is an Array first. This spec does not prescribe if update to value will add to list or replaces the list. It's up to user. No Single, Any, Choice, Sequence collection type tags are offered. Infer from Predicate's nature instead.

Many RDF software vendors will see something of "theirs" in this because this is so elementary. There is some JSON-LD in here (lists as values). This also looks similar to RDF/JSON (when you visit that page, you need to block away (with the palm of your hand) the ugliness of what their "O" stands for.)

There is no prescribed Types system, but you are welcome to use "Type" as *suggested* Predicate name for value containing a universal ID of the type. The type with that ID can, of course have another entry in the same data set and may tell more about this type through its own predicates. It can be externally-universal IRI. It also does not have to exist / be defined anywhere.

Example:

data : {
    'Human_USA_ID000-00-1111':{
        'Name': 'Joe Shmoe'
        , 'Friend': ['Human_USA_ID000-00-9999','Dog_UUID000000000011']
    }
    'Human_USA_ID000-00-9999':{
        'Type':'Type_ID01'
        , 'Name': 'Sue Shmoe'
        , 'Friend': 'Human_USA_ID000-00-1111'
    }
    'Type_ID01':{
        'Label':'Human Female'
        , 'Description':'The prettier one. That one that is always right.'
    }
}

Because the type system in this proposal is so lax and the namespacing is optional, if we would call this new format "RDF" we could have the legion of the keepers of RDF guarding our doors with pitchforks. To avoid the pitchforks let's call it "Reduced RDF as JSON" or "rRDFj" for short.


Long Live rRDFj!

Much like many other new, shiny projects, rRDFj was born out of perceived problem meeting lack of palatable solution. We have a need to push piles of loosely-coupled data from server to (internal and external) control interfaces. Although RDF is shiny and cool, the baggage (infrastructure, learning) was just too much. We chose to squeeze the best parts out of it and can it for easy consumption.

Blurbs like this one: "Merchant 10111, with company name of 'Demo Merchant' will incur the following fees whenever they process transactions through us (as bank): (a) ACH Debit $1.00 per tran + 0.50% of total, (b) ACH Credit $2.15 per tran. Other fee points compatible with this type of merchant, but not set yet are (c) Inactivity fee with default price point of $20, (d) Managed Customer Service fee with default price point of $2.00 per instance." are now flung across our system with ease.

At the generation end we have simple SQL-based table-to-triple conversion procs (based on UNPIVOT). On consumption end we have JavaScript-based parser-helper just about 110 lines long, that basically wraps JavaScript's Array.map() and Object.keys() into friendly chainable callables. With rRDFj generating and consuming RDF triples had become inexcusably easy. I'll show you how in the next post.