Company
VisiCube
Consulting
logo
Row 1 of background
Row 2 of background
Row 3 of background

Data Exploration

It is often the case that some effort must be made to focus your attention on pertinent aspects of your data before true analysis can begin. This is almost universally true for large data sets, especially that data which was not gathered in a controlled or focused manner. But it is often also true of even small data sets gathered with very rigid and specific techniques. Even very small data sets have myriad subsets, each of which might be especially pertinent to a given study.

Narrowing your focus so that you can thoroughly analyze data is problematic because you may lose important perspectives in doing so. But it remains an important task. The question is how to go about it. In general, there are two methodologies available and they can generally be described as automatic and manual. More commonly, these are called data mining and data exploration, respectively. Though these terms are not especially well-defined in the general literature, they are commonly used and I attempt to clarify these here to help you understand VisiCube's usefulness as well as my own philosophy.

Data Mining
Data mining, as well as its cousin data prospecting, is a term which is abused in every day usage … sometimes being used synonymously with "data analysis", just sounding more interesting. However, technically speaking, they are not the same.

Data mining is a methodology typically brought to bear on large data sets, even entire databases, to discover interesting aspects of that data set which should be further analyzed. Though usually guided by human-specified parameters, the mechanisms are automated algorithms that may include aspects of artificial intelligence and machine learning. Such automation is necessary to make the task of mining feasible when the size of the data set is very large. But the methodology can be used on data sets of any size.

Running with the analogy a bit, it can be said that the site of the deposits is vast and some sort of automated mechanism is needed to locate the desired deposit within that site. Once that deposit has been located, the more manual process of extracting the ore can begin.

Data prospecting adds an additional step. In some literature, the additional step is that of locating the site within a vast land. In other literature, the additional step is that of determining the type of deposit that is located in the site. But, in either case, this is an activity that precedes the actual extraction.

Stated simply, data mining is a methodology using automated techniques to bring focus to the pertinent parts of a data set. Having no such automated mechanisms, VisiCube is not a data mining (or data prospecting) tool.

Data Exploration
Data exploration, on the other hand, is a methodology in which manual techniques are utilized to find one's way through a data set and bring important aspects of that data into focus for further analysis. Though such a methodology can be applied to data sets of any size or type, its manual nature makes it more reasonable for smaller data sets, especially those in which the data has been carefully gathered and constructed.

Of course, a major advantage of a manual approach is that the mechanisms utilized do not, by design, prevent you from exploring particular aspects of your data. The automated methods of data mining are forever limited by their particular design.

As with data mining, there are no specifications as to how these methodologies are to be implemented. But the analogy to actual exploration is very enlightening. An exploration is an activity in which any of a great number of paths and techniques might be utilized … and it may take place over a very long period of time. The key to managing such an exploration is to be organized. Keeping records about the exploration, recording your thoughts and ideas along the way, and organizing your findings are all important. This is a complex undertaking, though possibly very rewarding.

VisiCube
VisiCube is a data exploration tool. This is in addition to its capabilities in data analysis. And it is something that really makes VisiCube stand out from its competitors.

VisiCube is designed to enable you to record any point in your exploration with a single click of the mouse. This paradigm allows you to operate in either of two modes:

  • You may generate a record of your exploration by capturing the state of your analysis at any point along that path … effectively recording the steps that lead to your discoveries.
  • You may record markers, cairns if you will, along that path … annotated as desired … to facilitate easy return to the exact spot (and state) from which you can then pursue another path.

There are many data analysis tools, but few include support for true data exploration. And none that I know of do it as completely or as naturally as VisiCube.

THE DATAMOLOGY COMPANY Home of VisiCube, The Data Microscope