Start networking and exchanging professional insights

Register now or log in to join your professional community.

Follow

What are the best practices used by the data input clerk to ensure the quality of data?

user-image
Question added by Deleted user
Date Posted: 2013/06/13
Lubna Al-Sharif
by Lubna Al-Sharif , Medical Laboratory Technician , Nablus Specailized Hospital

== Data Quality means the perception or an assessment of data's fitness to serve its purpose in a given context, or the totality of features and characteristics of data that bears on their ability to satisfy a given purpose; the sum of the degrees of excellence for factors related to data.
== So, the eight aspects of data quality to makes data appropriate for a specific use include: - Accuracy - Completeness - Update status - Relevance - Consistency across data sources - Reliability - Appropriate presentation - Accessibility == Data quality is more than accuracy and reliability.
High levels of data quality are achieved when information is valid for the use to which it is applied and when decision makers have confidence in and rely upon the data.
in order to increase and maintain data quality, you have to implement these steps organization-wide : -
1- Data profiling: It is a statistical analysis and assessment of data availability in an existing data source (e.g.
a database or a file) and collecting statistics and information about that data.
= This can be useful in improving the ability to search the data easily, and whether the data conforms to particular standards or patterns, as well as exploring relationships that exist between value collections both within and across data sets.
-
2- Data standardization: It is the process of reviewing and documenting the names, meaning, and characteristics of data elements so that all users of the data have a common, shared understanding of it.
A business rules engine that ensures that data conforms to quality rules can be used for.
= Standardization will help you making the source data internally consistent; that is, each data type has the same kind of content and format, re-formatting data and create a consistent data presentation with fixed and discrete columns, according to your company requirements, and using the data content and placement within the record context to determine the meaning of each data element.
-
3- Geocoding - It is a process of assigning locations to addresses to that they can be placed as points on a map, similar to putting pins on a paper map, and analyzed with other spatial data.
The process assigns geographic coordinates to the original data, hence the name geocoding.
It is also called address-matching.
= This method can be used for name and address data.
Corrects data to US and Worldwide postal standards, finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or ZIP codes (postal codes).
= The Google Maps API provides a geocoder class for geocoding and reverse geocoding dynamically from user input.
A geocoder is a piece of software or a (web) service that helps in this process -
4- Matching or Linking - a way to compare data so that similar, but slightly different records can be aligned.
Matching may use "fuzzy logic" to find duplicates in the data.
= It oftens recognizes that 'Bob' and 'Robert' may be the same individual.
It might be able to manage 'house-holding', or finding links between husband and wife at the same address, for example.
Finally, it often can build a 'best of breed' record, taking the best components from multiple data sources and building a single super-record.
-
5- Monitoring - keeping track of data quality over time and reporting variations in the quality of data.
Software can also auto-correct the variations based on pre-defined business rules.
-
6- Batch and Real time - Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.
== Finally, the most obvious ones are to advance your methods and tools in the capture of the data.
The most commonly overlooked step is to utilize advance data quality technologies, and to utilize them in multiple steps in the customer identity cycle.
== As a result, advance data quality in the database should be deployed weekly to ensure the identity and value of each customer is properly consolidated at any give instance.
However, the same process should also be applied to each direct mail campaign deployment to refresh the postal address and ensure the promotion is reaching the consumer at his current and most responsive address.
== Otherwise,you can find DATA QUALITY SOFTWARE to ensure the process, like Talend Open Studio for Data Quality, Informatica Data Quality and Trillium Software System (or business and IT collaboration).
GOOD LUCK

Tarique Faris
by Tarique Faris , Sales Manager , MetroPCS

Knowledge of the source data quality avoids surprises during the ETL process.
Unexpected conditions often require rework.
For example, if null conditions aren't properly anticipated for a required field, when the error is encountered, the ETL code will need to be adjusted.
The adjustments may also require rework in upstream or downstream programs as well as a repeat of the testing for any completed programs.

More Questions Like This