Start networking and exchanging professional insights

Register now or log in to join your professional community.

Follow

What is the main difference between structured and unstructured data and which approach would you choose to analyse unstructured data?

user-image
Question added by Daniel Marx , Software Developer , Check24
Date Posted: 2014/03/04
Maalik Muhamed
by Maalik Muhamed , Deputy Mill Manager , AZANIA GROUP OF COMPANYS

Structured data

is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools.  -

Un structured data

Unstructured data is a generic label for describing any corporate information that is not in a database.  Unstructured data can be textual or non-textual.  Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents

Daniel Marx
by Daniel Marx , Software Developer , Check24

The fundamental difference between structured data and unstructured data is that structured data is organized in a highly mechanized and manageable way.  Structured data is ready for seamless integration into a database or well structured file format such as XML.  Unstructured data, by contrast, is raw and unorganized.  Digging through unstructured data can be cumbersome and costly.  Email is a good example of unstructured data. It's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured.  Other examples of unstructured data include books, documents, medical records, and social media posts.

 

Analyse unstructured data

 

  1. Classify unstructured data. Most corporate data environments are pretty chaotic. Word documents, email, PDFs, spreadsheets, and other data files are scattered across the enterprise. The good news is that most unstructured data is also clear text. As such, this data can be read, indexed, compressed, and stored fairly easily. Classifying unstructured data is the first step to being able to identify unstructured data sources before eventually parsing and using data visualization tools.
  2. Set enforceable storage policies. Most data has a shelf life. New data is frequently accessed during its first90 days of life and usage tends to taper off after that. Because of these usage trends, data should be regularly examined for dates, the most recent usage, and then discarded or archived based on data retirement policies enforced by the IT organization.
  3. Evaluate your BI infrastructure and adjust as needed. Before organizations begin analyzing unstructured data, it’s helpful to evaluate the current business intelligence (BI) infrastructure that’s in place and how it all fits together. It’s not always easy to create structured definitions of data that’s stored within non-traditional data sources. As such, the data management team should identify the steps that are needed to integrate unstructured data into a structured BI environment.
  4. Don’t overlook metadata. Making effective use of unstructured data requires an approach to organizing and cataloging content. In order to use the content, it’s helpful to know what that content is. Some systems automatically capture process-related metadata, or attributes such as creation date, author, title, etc. However, applying metadata to actual content such as content summaries, companies or people mentioned, or topic keywords can be considerably more useful.
  5. Apply unstructured data analysis. BI tools can’t analyze unstructured data directly. However, specialized data analysis technology can be used to analyze unstructured data as well as to produce a data model that BI tools can work with. Unstructured data analysis can start by using a natural language engine to measure keyword density. This approach, along with the use of metadata, can help data scientists and decision makers get at the heart of what key stakeholders are looking for using data discovery tools and techniques (e.g. positive or negative comments about a company in social media comments). - See more at: http://spotfire.tibco.com/blog/?p=15273#sthash.PxRtmU3v.dpuf

Deleted user
by Deleted user

The distinction between the two is not always made clear.

The structured data can provide the what, where and when, but not the how or the why.

More Questions Like This