Start networking and exchanging professional insights

Register now or log in to join your professional community.

Follow

The complexity of some data sets leads people to add extra dimensions as Veracity and Variability. In what ways do these complicate things further?

user-image
Question added by Mohamed Fetoh , Openstack Operations Engineer , Orange Business Services
Date Posted: 2015/03/23
Deleted user
by Deleted user

Well, adding some veracity to an already-complex dataset will not only increase the amount of data to be analyzed, but it could introduce values which skew distributions or make predictions which are based on artificial values. Making a dataset easier to work with does not guarantee that it is going to output useful information. It might be necessary to tweek your data mining scripts or algorithms to accommodate changes to the accuracy of the content

 

Adding variability can complicate matters to an even greater extreme. For instance, imagine if you had a dataset of plaintext-English, but wanted to incorporate Arabic content to analyze similarities or differences between the two sets. Well, not only would you have to find someone of translating this content, but you might have to make sure the translated Arabic is encoded via the same standards (such as ASCII or UTF-8). Furthermore, you would have to account for basic differences in syntax, such as the structure of sentences, which would rely heavily on the bias of the human-translator. Such problems can make data-sets useless if not done through a proven method. 

More Questions Like This