Factsage Database Normalization

lasopanature

First Normal form is the first step of normalization. In this tutorial we will have an example to explain how you can update you table to follow the First Normal Form or 1NF. This is the beginning of Database Normalization process.

Have you reached the point of having seen the term “normalization” used database literature, but you are still unsure as to just what you have to do to get a normalized database? If so, then you’re not alone. We will be showing, in this article, how Normal Forms keep data integrity.

Some people will argue that it is all right to “denormalize” a schema for efficiency reasons. If we denormalize the schema, then we are either not enforcing some of the user-required constraints or we have to write addition code to enforce the constraint. If we choose to write additional code, we have to duplicate the functionality in every application, present and future. Normalization reduces programming effort; rules are enforced in one place, one way, one time.

Let’s start in a simple way with some general heuristics about normalization. Heuristics are exact and often sound a little funny when you say them. One of my favorites is “if you cannot think of it as a percentage, think of it as a ratio!” (From a mathematical view point, a ratio and percentage are the same. But percentages have a base while the two things in a ratio are seen as equal in some sense).

Mother Celko’s Thirteen Normalization Heuristics. Does the table model either a set of one and only one kind of entity or one and only one relationship. This is what I call disallowing a “Automobiles, Squids and Lady GaGa” table. Following this rule will prevent ‘Multi-valued dependencies’ (MVD) problems and it is the basis for the other heuristics.

Does the entity table make sense? Can you express the idea of the table in a simple collective or plural noun?

To be is to be something in particular; to be everything in general or nothing in particular is to be nothing at all (this is known as the Law of Identity in Greek logic). This is why EAV does not work – it is everything and anything. Do you have all the attributes that describe the thing in the table?

The most important leg on a three-legged stool is the leg that is missing. Are all the columns scalar? Or is a column serving more than one purpose? Did you actually put hat size and shoe size in one column?

Or store a CSV list in it?. Do not store computed values, such as (unitprice.

orderqty). You can compute these things in VIEWs or computed columns. Does the relationship table make sense?

Can you express the idea of the table in a simple sentence, or even better, a name for the relationship? The relationship is “marriage” and not “personpersonlegalthing”. Did you check to see if the relationship is 1:1, 1:m or n:m? Does the relationship have attributes of its own? A marriage has a date and a license number that does not belong to either of the people involved.

This is why we don’t mind tables that model 1:1 relationships. Does the entity or relationship have a natural key? If it does, then you absolutely have to model it as the PRIMARY KEY or a UNIQUE constraint. Is there a standard industry identifier for it? Let someone else do all that work for you.

If you have a lot of NULL-able columns, the table is probably not normalized. The NULLs could be non-related entities or relationships. Do the NULLs have one and only one meaning in each column?. If you have to change more than one row to update, insert or delete a simple fact, then the table is not normalized. Did you confuse attributes, entities and values? Would you split the Personnel table into “MalePersonnel” and “FemalePersonnel” by splitting out the sex code? No, sex is an attribute and not an entity.

Would you have a column for each shoe size? No, a shoe size is a value and not an attribute. The Writing on the Wall Many years ago, there was a magazine named Database Programming & Design which published a poster on Normalization by Marc Rettig as a subscription renewal premium. It was so popular that it is still around after all this time. It follows one example, the Daisy Hill Puppy Farm database (this is the birth place of Snoopy in the Peanuts comic strip). You really ought to have a copy hanging in your cube.

It is that good. It is called. You can now find it here: The Relational Model The Relational Model and Normal Forms of the Relational Model were first defined by Dr.

Codd then extended by other writers after him. He borrowed the term “normalized relations” from the US and Soviet political jargon back when the USSR still existed. Relations are a branch of mathematics which deal with mappings among sets defined by predicate calculus from formal logic. Just like an algebraic equation there are many forms of the same relational statement but the “normal forms” of relations are certain formally defined desirable constructions. The goal of Normal Forms is to avoid certain data anomalies that can occur in un-normalized tables. Data anomalies are easier to explain with examples.

Codd defined the relational model it added some additional requirements to the visualization of the relation as a table:. If a row in a table can exist without a certain attribute being known, then this is considered a NULL state for that attribute and the DBMS must use a special symbol for that state. We have that in SQL, but NULLs have a data type to signal the SQL engine how to handle the physical storage.

No two rows are identical. Furthermore it must be possible to define a subset of the known attributes of a relation which have no identical values. This subset of attributes is known as a “key”. SQL was built on old file systems, and the language can allow redundant duplicates. But that is a sign of really bad programming. One key in a relation is the “primary key”. This key cannot have any NULL attributes, of course.

This initial definition was later modified to regard all keys as “equally key-like”, but the notion was set in the early literature and it got into SQL. The actual motive was from sequential files; a sequential tape file has to be sorted on a sort key to work.

ISAM files on disk also had to have a sort key. This is why T-SQL has clustered indexes and and clustering as the default for the PRIMARY KEY. The new technology mimics to old technology, until it learns better and has its own voice. The order of rows and columns are irrelevant and do not contain any information. This leads us to the requirement that any atomic fact in the database (a collection of such tables) can be found by a table name, column names, and a primary key value. Again, SQL assumes that it can go to the ANSI/ISO Standard Schema Information meta-data tables and get a default list of columns to fill in the column lists in various statements. It is a shorthand.

Anyone tackling relational databases who is familiar only with file-based data will assumes that there is a physical ordering because that is how all of the flies he has ever worked with are stored — physically contiguous tracks on a disk. Columnar databases are not contiguous storage. Neither is a hashed database. The first thing you have to do to learn SQL is throw out a physical model mindset and think in abstractions. Codd added quite a few other requirements (initially a total of 12 rules and eventually 333) but those listed here will suffice for our discussion. Anomalies If you do not normalize a table, you get data anomalies. These are situations where data integrity cannot be maintained without extra code, NULLs and convoluted programming.

Let’s create a simple example. The table holds information about college students, their majors and their departmental advisers. VALUES ( NULL, 'Architecture', 'Wright' )' SQL will prevent this row from getting into the table the ways I declared this example. It is bad practice to declare an IDENTITY as the key and all other columns are NULL-able.

Since the table’s IDENTITY property is not a column nor an attribute of any possible data model, it cannot ever be used as a key. It is the count of the physical insertion attempts. You can use it to mimic the record numbers that are so familiar from using sequential files, but they have no place in the relational model. First, Second, Third, and Boyce-Codd Normal Forms Let’s talk about going from a file system to an RDBMS. Let’s start with a file to maintain data about class schedules at an imaginary school. We keep the coursenbr, sectionnbr, timeperiod, roomnbr, professorname, studentname, studentmajor, and coursegrade. Assume we started with a Pascal file, declared like this, and wanted to move it to a relational database.

I picked Pascal because it is easy to read and understand, unlike currently popular languages. END; To convert this sequential file of records into a table, we first have to “flatten” it out, making sure that each row has full information for all the columns. Since Pascal, or any other language, does not have NULLs, the columns will be NOT NULL. The first thought is to simply rewrite every field over as a column, a line-for-line translation, as I said earlier. It does not really work because a column is nothing like a field; did you notice that “Students” is itself an array of records whose size is determined by an external variable? This would have been done with an OCCURS clause in COBOL and other programming language have similar features.

But let’s do anyway. Sellername = BS. Sellername; This will add false rows, such as (‘Smith’, ‘Jones’, ‘Home Bank’). These three tables should have been implemented as VIEWs and not as base tables Other Normal Forms and non-NF Redundancies There are more normalizations, but they involve temporal consideration and other things. There are also redundancies that involve more than one table. The most common non-normal form redundancy is a summary column in a referenced table.

In English, imagine the usual skeleton of an Orders table and its referencing order details table. Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa. He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards. He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.

Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction. The Two Issues — Normalization reduces programming effort; rules are enforced in one place, one way, one time. 1) Thus we find that coders run 99.44% of development shops.

Such coders aren’t really dense, and generally understand both the RM and SQL. They don’t oppose RDBMS as the repository of the constraints of record out of ignorance. Rather, they’re only worried about continued employment as coders. So, the notion of reduced effort is seen as reduced income. Or, as one former colleague said to a prospect, 'We prefer to manage transactions in our application code'.

So long as development is run by coders, so too will RDBMS stored data be treated as dumb-as-a-sack-of-hair bytes. Just as it was by today’s coders’ Grand Daddies pushing out COBOL.

2) It’s worth mentioning that SQL is a bad image of the RM due to Chamberlin, not Codd who was banned by Armonk from defining the language for the RM. A developer’s viewpoint In answer to Robert young, I’ve been around a long time and I don’t recall ever seeing developers like the ones he describes, who favor one DB over another because of wanting to write more code. There’s always far more code to write than can be completed in a lifetime as it is, without trying to drum up more work. I am always looking for the easiest, most efficient (and stable) way to write code so that I can get more done and make my customer happy. In our company, the chief database architect is trying to shove EAV down everyones’ throats.

Most have buckled under, but not my team. We have fought it for all we’re worth. The code required to work with EAV is a nightmare – but the architect maintains that we will fail because we are not using EAV.

I guess it’s clear that we are in that.56% group that is not run by coders. Igor: Abi Normal @Eric P The LoC centric behavior describes coders, whose job it is to write C/java/COBOL/etc., not database developers. These are the folks who aim to keep the end of coding at some point well past retirement; 'far more code' is the mantra. If DRI is your friend, then java is your enemy, so to speak. Building Organic Normal Form databases reduces, sometimes considerably, the amount of client-side code needed (i.e., all that C/java), and thus is a threat to them. I only cited one instance; I’ve experienced many.

Anyone who’s an EAV zealot reads the wrong magazines. You have my sympathy. Normalization is Essential Great article – unfortunately our java developers insist on keeping logic and constraints in the java application, generating keys and maintaining tables as slot buckets with an ORM object relational mapper based on HIbernate. Of course their database is impossible to decipher, queries using HQL rather than Sql are a disaster and the simplest of queries requires massive table scans and spikes in CPU usage. This feels like flat file land – did I mention that one sql developer has the productivity of 5 java coders? The only true agility comes from using practical theory not homespun agile coder mythology.

Overview of databases The Fact and FactSage databases are the largest set of evaluated and optimized thermodynamic databases for inorganic systems in the world. The solution databases (for solutions of oxides, salts, metals, etc.) have all been developed by evaluation and 'optimization' of data from the primary literature. Based on proper thermodynamic models for every phase, all available thermodynamic and phase equilibrium data for a system are evaluated simultaneously in order to obtain one set of model equations for the Gibbs energies of all phases as functions of temperature and composition.

In such an 'optimization', all data are rendered self-consistent, discrepancies in the data can often be resolved, and the data can be properly interpolated and extrapolated. In particular, properties of multicomponent solutions can usually be estimated with good accuracy from the optimized model parameters of their binary and ternary sub-systems.

The resulting database of model parameters can be used for calculating phase equilibria and thermodynamic properties using the FactSage Gibbs energy minimization software. FactSage accesses both 'solution' databases and 'pure compound' databases. The former contain the optimized parameters for solution phases. The latter contain the properties of stoichiometric compounds, either obtained from optimizations or taken from standard compilations.

FactPS - FACT pure substances database - contains data for 4869 compounds (pure substances). contains data from standard compilations as well as most of the data for those compounds which have been evaluated / optimized to be thermodynamically consistent with the FACT FToxid, FTsalt, FThall. Solution databases (see below). FactPS database further information: 1., 2. SGPS - SGTE pure substances database - contains data for 3842 compounds (pure substances) compiled by the SGTE groups.

in general these data are not necessarily thermodynamically consistent with the SGTE solution databases. SGPS database further information: 1. SGnobl - SGTE noble metal database (formerly FSnobl) - The database contains evaluated thermodynamic parameters for 223 binary and 130 ternary alloys of Ag, Au, Ir, Os, Pd, Pt, Rh, Ru alloyed amongst themselves and also in alloys with the metals Al, As, B, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cu, Dy, Fe, Ge, Hf, In, Mg, Mo, Nb, Ni, Pb, Re, Sb, Si, Sn, Ta, Tc, Te, Ti, Tl, V, W, Zn, Zr.

Noble metals and their alloys have a wide variety of applications, and calculations of relevant phase equilibria in a particular case are important e.g. For optimizing suitable alloy compositions or predicting reaction products in chemically aggressive environments.

FSnobl information: 1., 2., 3., 4. SGnucl - SGTE nuclear database - includes the elements O, U, Zr, Fe, Cr, Ni, Ar, H. Also included are systems formed among the six oxides UO2, ZrO2, FeO, Fe2O3, Cr2O3, NiO. has been generated by Thermodata (Grenoble) as part of a much bigger database effort to cover many thermochemical aspects related to the field of nuclear reactors. The SGnucl database is specially made for the investigation of in-vessel chemical reactions. It is an SGTE approved database and has been converted for use with FactSage by GTT Technologies. S Gnucl information: 1., 2., 3., 4.

SGTE - SGTE( 2017) alloy database The SGTE(2017) database represents a significant update and revision of the previous SGTE(2014) alloy database. The 79 elements included in the database are, Ag, Al, Am, As, Au, B, Ba, Be, Bi, C, Ca, Cd, Ce, Co, Cr, Cs, Cu, Dy, Er, Eu, Fe, Ga, Gd, Ge, H, Hf, Hg, Ho, In, Ir, K, La, Li, Lu, Mg, Mn, Mo, N, Na, Nb, Nd, Ni, Np, O, Os, P, Pa, Pb, Pd, Pr, Pt, Pu, Rb, Re, Rh, Ru, S, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Tc, Te, Th, Ti, Tl, Tm, U, V, W, Y, Yb, Zn, Zr From among these elements, there are some 603 completely assessed binary alloy systems, of which 15 are newly assessed systems and 28 others have been revised or amended on the basis of newly published experimental information. The database also includes about 141 ternary and 20 higher-order systems for which assessed parameters are available for phases of practical relevance. The systems now incorporate 318 different solution phases and 1227 compound phases (mainly stoichiometric intermetallics). SGTE information: 1., 2., 3. SGsold - SGTE solders database - This is a new database from SGTE for alloy systems involving the following elements which are components of lead-containing and lead-free solders: Ag, Au, Bi, Cu, In, Ni, Pb, Pd, Sb, Sb, Zn.

SGsold information: 1., 2., 3. BINS - SGTE( 2017) free alloy database - This is the SGTE free binary alloy database which comprises some 109 of the BINARY systems taken from the SGTE(2017) alloy database. BINARY information: 1., 2., 3. SpMCBN - The Spencer Group database for carbide, nitride, boride and silicide systems - The Spencer Group Non-Oxide Refractories Database, SpMCBN, contains assessed thermodynamic parameters for binary and ternary alloys of high-temperature materials containing carbon, nitrogen, boron, and silicon. The alloys include Me1-Me2-C, Me1-Me2-N, Me1-Me2-B, Me1-Me2-Si, Me-C-N, Me-C-B, Me-C-Si, Me-N-B, Me-N-Si and Me-B-Si systems.

includes the elements B, C, N, Si with Al, Ca, Co, Cr, Fe, Hf, Mg, Mn, Mo, Nb, Ni, Re, Sc, Ta, Tc, Ti, V, W, Y, Zr. While no data for oxide systems have been included in the database, a number of the elements listed above are important in refractory oxide materials.

Reactions of the carbide, nitride, boride and silicide systems with such refractory oxides and with oxygen-containing gas atmospheres can be calculated using FactSage by selecting the SpMCBN database together with appropriate combinations of the FToxid, FactPS and SGPS databases for the materials in question. TDnucl and TDmeph Thermodata nuclear database - TDnucl is a database for in- and ex-vessel applications containing 18+2 elements: O, U, Zr, Ag, In, B, C, Fe, Cr, Ni, Ba, La, Sr, Ru, Al, Ca, Mg, Si + Ar, H and including the 15 oxide system: UO 2, ZrO 2, In 2O 3, B 2O 3, FeO, Fe 2O 3, Cr 2O 3, NiO, BaO, La 2O 3, OSr, Al 2O 3, CaO, MgO, SiO 2. TDmeph is a similar database containing 14+2 elements: O, U, Pu, Zn, Fe, Si, C, Ba, Ce, Cs, La, Mo, Ru, Sr + Ar, H. covers the entire field from metal to oxide domains. It allows the user to calculate the thermochemical equilibrium state at any step of a severe accident and to use the results of the thermodynamic approach for improving the predictions of thermo-hydraulic or other accidents codes. has been generated by Thermodata (Grenoble) and converted for use with FactSage by GTT Technologies.

Applications are numerous and include:.