I know, you are asking yourself, ‘Why do I need to know about data, it’s just boring numbers.’ Well, yes and no. Data may be represented as numbers, but it’s more than that. Data are the building blocks of organisations, computer systems and the internet.
Think of data as the way that all real world concepts and things are coded so that they can be used in the digital world. So not just numbers; pictures, sound and video files, web content, measurements used for process control, and Customer records are all data.
Data are important
Data is* often misunderstood and under-appreciated as a key component of any IT, software or business system. Unless you are specifically interested in the ‘numbers’ you may be forgiven for thinking that technology is only about the User Interface, the physical devices, and the moving pictures, graphics, sounds & textual content on your smart phone, tablet, or website, for example. However, none of this technology could function without data. In fact, in a pure sense, the applications, pictures and all the content on the internet are data, the coded and digitized building blocks of all computer systems. In the early days of computing, there was a clear distinction between the physical devices and the coded instructions (computer programs). Data was ‘fed’ into computers on some physical media and the results or output presented in some human or machine-readable form. Now, it is less clear where the computer, the input and output devices, and the data start and end.
(*Ed. should be, ‘data are’ as it is a plural, but this can sound a bit odd, so apologies in advance for any poor grammar!)
Data vs. Information
Let’s get this tricky one out of the way first. In my simple world, data are the things that we know about the things that are important to us! An example is needed I think. One thing of interest to a lot of people is the Car that they drive, as individual owners, but also the manufacturer and distributor, the organisations who get involved in providing spare parts, servicing, registration, tax and insurance etc. Let’s assume that somewhere there is a record of Cars with the properties of interest, such as Make, Model, Year of Manufacture & Colour. These are the attributes associated with all Cars. The data for a specific Car, such as Ford/Mondeo/1997/Silver should represent an accurate view of the world – in this case a physical entity – at a point in time, sometimes called the single version of the truth.
Information on the other hand is dependent on a context and interpretation of the data. An underwriter or actuary in an insurance company, for example, may want to know whether blue cars are a better risk than other colours. This question may result in a query of the data for all cars that the company insures as well as the Claims history, another set of records (table) in the organisation’s database. This query will result in some information that the requestor can use to make decisions with, or maybe to re-formulate the question. For example, does is matter how old the car is, or the gender of the driver or claimant?
There are many techniques that are used to understand and represent data (the disciplines of data analysis and data modelling), as precursors to building a database solution. A database is useful shorthand for the design of the data (i.e. how it is structured) and the physical realization of the design in some data storage technology. It is beyond the scope of this introductory module to consider any of these approaches or technologies in any detail. However, there are a number of generic terms and guidelines that are solution-agnostic. I will finish this section with a simple worked example.
- A single item of information datum (also called a field or an attribute) will typically contain a description, a type and a value, e.g. Date_of_birth, ShortDate, 16/06/65
- Data items may be coded, for example a colour may be represented by a pantone or ‘RGB’ scheme. Obviously all the systems that need to ‘read’ and interpret the data need to understand the coding rules – mismatches in the mapping or transformation can cause problems when data are shared or ‘flow’ between a sender and a receiver both internally in an organisation’s systems or between two parties.
- Depending on the data type and the relationship between fields there may be business rules to insure the accuracy or integrity of the data. For example; date validation, limits on the size of financial or text fields, age restrictions, missing or invalid data etc.
- Data representing a single coherent entity or ‘thing’ is called a record, such as the Ford Mondeo mentioned earlier.
- A set of records is a called a table or a dataset.
- Each record will normally have a unique set of attributes or a coded ‘key’ so that it can be identified. A Person who may be a Car owner, driver or insured party might by identified by their driving licence number or an arbitrary driver or Customer number.
- Multiple tables are called a database, or a data mart or a data warehouse…
- Relationships between data groups within a larger dataset, such as the earlier example of a Car and any insurance Claims (zero, 1 or many) are sometimes split-out or ‘normalised’ into separate tables.
- It is important that data are secure and protected from unauthorized access, corruption, whether accidental or deliberate, or degradation (becoming out-of-date or the storage media physically degrading). Data encryption, access permissions, data back-ups and Data Protection guidelines are just some of the measures used to maintain the accuracy and privacy of the data. This is a large and complex subject in its own right.
This diagram represents a simple logical view of some data for a possible bank account application. Data structures, if well designed, can provide a relatively stable model in a fast-moving world, and might not need to change if, for example, the bank were merged or renamed, or the account was accessed via a member of staff in the bank’s office, or by the account holder from an ATM, web or mobile phone app.
Data is in the blue group of elements in top left hand corner of the Table of IT Elements, as it often forms a critical part of the ‘what’ question in Projects, such as; ‘What business changes do we want?’, ‘What statutory reporting is required?’, and, ‘What information do I need to share with Customers, suppliers and partners?’
Elsewhere in the table are Management Information, Big Data, and The Cloud, all important concepts in understanding why Data is so important to organisations. I mean any organisation that has customers, handles money, is responsible for sensitive information, is competitive (in its market), has to report to some regulatory authority, and a whole raft of other strategic and operational business drivers.
Data are required for the day-to-day operation of almost every organisation, Data also help to answer questions for decision-making and strategic planning (Business Intelligence), for internal Management Information, and for external regulatory and statutory reporting.
Big Data is a relatively new term for large and complex datasets, which require cutting-edge techniques and tools to analyse. Traditional relational databases and query technologies can’t cope with the enormous amounts of volatile data from diverse areas such as medical & scientific research, Customer online buying behavior, and currency market fluctuations.
The Cloud – or more correctly Cloud Computing – is not exclusively a data-related thing, but a general movement of technology away from local wholly-owned systems to remote managed services & resources (including data storage) that could be anywhere in the world, metaphorically ‘in the clouds’. Anyone who uses online social media or picture sharing websites is accessing the cloud, and more significantly this is where private and public organisations are moving a lot of their business applications (software) and data.
As always thank you for your feedback and comments on this topic. Please Ask the IT Chemist if you have any questions or want to know more.
(c) 2015 Antony Lawrence CBA Ltd.
(This module has been adapted and extended from an earlier post Data ‘101’ Primer)