JMA201-01
Fundamentals of Application Development

DEPARTMENT SCHEDULE SYLLABUS RESOURCES
Database Terminology

Data
Raw values which an organization has a need to store and retrieve.

Information
Data that has been transformed (sorted and interpreted) into something meaningful. For example – each of your names in a list might be data, but a count of the values in that list provides me information – how many students are in this class?

Database
A related collection of tables of different types, stored with a minimum of redundancy. A database is a system that stores data with a minimum of redundancy while maintaining data integrity. Redundant (duplicate) data is wasteful – unique data is valuable.

Entity
Something (person, place, thing, or event) which we are tracking or representing within a database. Students, classes, purchases are examples of entities an organization might wish to track.

Table, relation
A table is a two-dimensional representation of instances (occurrences) of an entity. Each different entity we track is placed into its own table.

Record, row, tuple, entity instance
An occurrence of an entity in a table. Simply put, if a table tracks students, each row represents one specific student. Each record is self sufficient - no “see above”, as we commonly do in Excel spreadsheets. Represented horizontally in a table.

Attributes, field, column
Values attributed to a record. We describe an entity occurrence by describing its characteristics. For example, when I provide information about a student, I list the values that describe you – first name, last name, age, major, gender, etc. Each row in a table should share common attributes. The basic rule of table design is that for a given entity (table) be sure to only include attributes that are associated with that entity. (Note that the attribute means the name of the characteristic, not the value of that characteristic!) These characteristics are called attributes and are represented by the columns in the table.

Domain
The domain of a column/attribute/field is the range of allowable values. (What would be the domain for a Gender column?)

Relation
Simply put, a table in a relational database.

Primary Key
An unique identifier for an entity. We typically assign records a value that is unique. We can then look up the rest of that record’s values by using the primary key. Social Security numbers, account numbers, your student ID number, ISBN, UPC codes – these are all examples of primary keys.

Relationship
A table usually has one or more columns whose values are unique: No two rows can have that same value or combination of values within that column.  We can also place the value into another table to link the record to the first table. The value becomes a foreign key (FK) in the second table. A relationship can be derived by linking the values between the two tables; this is the basis of a relational database.

SQL
Structured Query Language - A language used to ask the database for information. Even when manipulating tables visually, the program is writing SQL for you behind the scenes. Example: Select * from users where lastname=’dupont’ order by age

Database Management System, Database server, Database engine
The system that manages the data and processes queries. A DBMS enforces business rules and constraints to ensure data integrity. DBMSs can be categorized by the number of concurrent users they can support. Access is a desktop level DBMS (although through programming it can actually support about 10 users at once!) while SQL Server can support 1000-5000 users at once!

Database Client or Front-End
The system that presents database structures running on the database engine visually. Databases are optimized for data storage and retrieval, not user interaction - having users working on the database takes resources away from the database's processes. Most databases encourage users to access the data contained within the database from a front-end client running on a separate computer.

Datatype
Type of information a particular value is. We choose the smallest datatype that will still hold the biggest value for a particular column.

Stuff to Think About Before You Dive In

There are a few scenarios you may find yourself facing. One, your company has a dedicated database administrator. (Database administration on large databases can be a complex job - in come companies, this is a full time position.) If your company has a dedicated database admin, you will probably not be responsible for creating the structure - just for understanding the structure, and being able to create joins, add data in the proper form, and so on.

The second scenario you might face is working in a company with no database designer. In this case, you must build the database structure. This can be more challenging, because you have to design the tables and relationships, and hope that they work well. I'd like to add a word of warning here: finalize the database design before you take the site live.

What does this mean? Well, you are probably building this database from the ground up. Chances are, you will build a few tables, and then build some web pages. At some point, you will begin entering test data to make sure everything is working right. (You would clear out the test data before taking the site live.) However, web design is never "done". There are always new areas to add, modifications to make. At some point, you will be asked to make the site available to the public, while you are still developing. This introduces one of the biggest potential nightmares you will face when doing a DB enabled website: migrating data. Lets say you have a table that stores user information. You have collected 2000 usernames. Then, it is determined that you need to add a few more fields to the database table. This in itself is easy; however, what do you do with the existing records? Perhaps you added a new field, the zip+4 field. You add it to both the table and the form; from here on out, it will be a required field. Users who enter this information after the modification will enter their data without problems - but what do you do with the 2000 existing users whose information you don't have? What do you do to your business logic, which says that the field can't be empty, but there are 2000 records with that field empty? Do you rewrite your business process to make it optional? Do you attempt to contact the original users to get the extra required information?

In this example, you see the problems of migrating data - taking data that was entered in one structure and reshaping it to fit into another. There are many different potential problems you will encounter. The best way to avoid them is to get exact definitions of that information needs to be tracked. Explain to the client that modifications after they sign off on the final design will be expensive and time consuming. And don't start taking meaningful live data until you are sure everything works! Remember - it may only be a change on paper, but a simple change of a paragraph in the design document could lead to months of work redesigning the database, migrating existing data, and changing the code to rebuild the information flow.

Next week we will look at building a sketch of our structures called an ERD (Entity Relationship Diagram).

Your First Table

A database exists to store information, retrieve it, and filter it as needed. We already discussed data types - formats for information used in both programming and databases. Remember that database datatypes are not necessarily the same as your programming language datatypes.

The basic structure of a database is a table. A table contains numerous records in the same format, theoretically about the same thing - perhaps users and addresses, or product details.

A table contains rows and fields. The fields are usually represented as columns - each column has a name (sometimes called the field name) and an expected data type. It can also have rules - for example, "you must always have a value in this field" (as opposed to no value, or NULL), or "there cannot be duplicate values in the field".

Rows are sometimes called records. A row represents a complete piece of information about something, formatted in the structure dictated by the field names and data type. Here is a basic example of how one might look:

 

Field

Field

Field

 

PID

Name

Password

  Row-> 

1

Bob

Bigbob

  Row-> 

2

Jane

sweety

  Row-> 

3

Ignar

boo

Some things to remember as we begin to build tables: A table is sort of an abstraction of something. We describe the characteristics that define what we are tracking. If we are tracking cars, we describe the car by it’s model, make, year, color, engine size, etc. Each record in this table should have values for each of these characteristics. If we find one record with a characteristic that doesn’t apply to the rest of the records, we don’t just throw it in there – we will find a different place to put it in a bit.

Excel Vs a Database
Many people think Excel and Databases are similar – they both have grids, or collections of rows and columns. A common Excel spreadsheet might look like this:

Name

Department

Advisor

Course1

Course2

Phil Smith 

JMA

DuPont

JavaScript

Database Design

Joe Smeal 

''

Shepherd

French 1

ASP Programming

Jane Doe

''

DuPont

History 1

Database Design

Ed Jones

''

DuPont

 

Web Design

 In this structure, we see many problems. First of all, in the department column, we see the dittoes – the “see above”. The records are not self-sufficient. Joe Smeal’s data does not make sense unless placed in context – unless displayed in the same order, under Phil Smith. That means we cannot just retrieve information about Joe – when we want his information, we need to retrieve more that what is needed, and the reader needs to do extra work to process and associate these to records to extrapolate the knowledge that Joe is in the JMA department. The records are not self-sufficient.

Look at the course column and advisor column. Here, we see redundancy – Database Design and DuPont are repeated values, consuming storage on the disk. In addition to being inefficient, it leads to problems and work for the database. Let’s say I decide to change my last name. I have to change my name in 3 records. If I change my name for the Phil Smith record, but not for the Jane Doe record, the database seems to indicate that there are 2 different advisors. We call this type of problem an update anomaly. Due to a faulty design, we need to needlessly update multiple rows when changing one fact

Another problem – the values are not atomic. Atomicity simply means a column holds only one value. In this case, first name and last name are individual values. We may sometimes want to arrange names in alphabetical order by last name, yet at other times print a mailing list with the first name first. The computer can’t do this because it doesn’t understand that the name column is actually 2 different values combined together – it doesn’t understand that the space separates 2 names. However, if we break them apart, the computer can easily put them back together later!

Yet another problem - the redundancy of the Course 1 and Course 2 columns. You always want to represent data as rows, not as columns. With this design, we are only able to enter two classes worth of information. To enter a third class, we'd need to add another column, which would require to stop our program, change the database, save, and rewrite the VB program to see the new column. New rows are always immediately accessible; new columns require reprogramming.

This is an introductory lecture only; normally, we'd go through 3 steps of of a process called normalization. However, for now, we'll just go through the first stage (called First Normal Form, or 1NF). This process states that our table has no duplicative columns, no dittoes, and every column is named, and there is a primary key. My results on this table might look like this:

Name

Department

Advisor

Course1

Phil Smith 

JMA

DuPont

JavaScript

Joe Smeal 

JMA

Shepherd

French 1

Jane Doe

JMA

DuPont

History 1

Ed Jones

JMA

DuPont

Web Design

Phil Smith 

JMA

DuPont

Database Design

Joe Smeal

JMA

DuPont

ASP Programming

Jane Doe

JMA

DuPont

Database Design

This table is still not ideal - there is redundancy. However, it's enough to start our VB programming.