Register now or log in to join your professional community.
Thank you for the invitation. Good question!
Being aware of the problem and how it has occurred. Disciplinary action for sharing data with unknown person or outsiders. Password files where necessary. Files to be protected by using techniques discussed. Managers can be careful but data can be stolen anyway. The challenges are before us and how to change these with the implementation discussed.
A data breach is the intentional or inadvertent exposure of confidential information to unauthorized parties. In the digital era, data has become one of the most critical components of an enterprise. Data leakage poses serious threats to organizations, including significant reputational damage and financial losses. As the volume of data is growing exponentially and data breaches are happening more frequently than ever before, detecting and preventing data loss has become one of the most pressing security concerns for enterprises. Despite a plethora of research efforts on safeguarding sensitive information from being leaked, it remains an active research problem.
Interested readers learn and recognise learn enterprise data leak threats, recent data leak incidents, various state‐of‐the‐art prevention and detection techniques, new challenges, and promising solutions and exciting opportunities. (Ref: WIREs Data Mining Knowl Discov,7:e. doi:./widm.).
Data leakage can be caused by internal and external information breaches, either intentionally (e.g., data theft by intruders or sabotage by insider attackers) or inadvertently (e.g., accidental disclosure of sensitive information by employees and partners). A study from Intel Security5 showed that internal employees account for% of corporate data leakage, and half of these leaks are accidental. Motivations of insider attacks are varied, including corporate espionage, grievance with their employer, or financial reward. Accidental leaks mainly result from unintentional activities due to poor business process such as failure to apply appropriate preventative technologies and security policies, or employee oversight.
The purposes of data leak prevention and detection (DLPD) systems are to identify, monitor, and prevent unintentional or deliberate exposure of sensitive information in enterprise environment. Various technical approaches are used in DLPD targeting different causes of data leaks.6 For example, several pioneering works7, 8 proposed to model normal database access behaviors in order to identify intruders and detect potential data breaches in relational databases. Basic security measures such as enforcing data use policies can safeguard sensitive information in storage. Traffic inspection is a commonly used approach to block sensitive data from being moved out of the local network.9
It is challenging for companies to protect data against information leakage in the era of big data. As data become one of the most critical components of an enterprise, managing and analyzing large amounts of data provides an enormous competitive advantage for corporations (e.g., business intelligence or personalized business service delivery). However, it also puts sensitive and valuable enterprise data at risk of loss or theft, posing significant security challenges to enterprises. The need to store, process, and analyze more and more data together with the high utilization of modern communication channels in enterprises result in an increase of possible data leakage vectors, including cloud file sharing, email, web pages, instant messaging, FTP (file transfer protocol), removable media/storage, database/file system vulnerability, camera, laptop theft, backup being lost or stolen, and social networks.
ENTERPRISE DATA LEAK THREATS
The literature presents different taxonomies regarding data leak threats.6, In this section, we use them to classify and describe major data leak threats. Then we review several enterprise data breach incidents and discuss lessons learned from these incidents.
One approach to the classification of data leak threats is based on their causes, either intentionally or inadvertently leaking sensitive information. Another approach is based on which parties caused the leakage: insider or outsider threats. As shown in Figure 1, intentional leaks occur due to either external parties or malicious insiders. External data breaches are normally caused by hacker break‐ins, malware, virus, and social engineering. For example, an adversary may exploit a system backdoor or misconfigured access controls to bypass a server's authentication mechanism and gain access to sensitive information. Social engineering (e.g., phishing) attacks become increasingly sophisticated against enterprises, by fooling employees and individuals into handing over valuable company data to cyber criminals. Internal data leakage can be caused by either deliberate actions (e.g., due to espionage for financial reward or employee grievances) or inadvertently mistakes (e.g., accidental data sharing by employees or transmitting confidential data without proper encryption). Hauer proposed comprehensive criteria for characterizing totally data leakage incidents and analyzed data breaches reported in recent years. The results reveal that in over% of the data breaches were caused by insiders, highlighting that technological as well as nontechnological measures are both important in preventing data breaches.
Data leaks can also be characterized based on other attributes, such as by industry sector or by type of occurrence. As reported by Identity Theft Resource Center (ITRC) in Figure 2, the total number of major data breach incidents (tracked by ITRC) keeps increasing in the past5 years. For example, the number of data breach incidents in is% more than that in. Figure 2(a) shows the stacked histogram plot of data breach incidents by industry sector. Business and medical/healthcare leaks take the majority of the leaks. In, business data breach has reports, taking.2% of the overall breaches, followed by medical/healthcare, representing.5% of the overall breaches with incidents. Data breach by type of occurrence is illustrated in Figure 2(b), where the ‘Other’ category includes email/internet exposure or employee error and so forth. From the figure, the number of breaches caused by malicious outsider in takes around% of the overall incidents. Although different cyber-security reports5, , may get different results due to using non-identical datasets, all these reports, including ITRC's statistics, confirm the trend that insider threats emerge as the leading cause of enterprise data leak threats, with more than% of breaches perpetrated from inside a company.
Detection of internal data leak incidents is extremely challenging, because internal breaches often involve users who have legitimate access to facilities and data. Their actions may not leave evidence due to their knowledge of organizations, possibly knowing how to bypass detection. With more and more covert channels and steganography tools available, malicious insiders make data breaches particularly difficult to detect. For example, malicious employees may bypass all enterprise security policies by concealing sensitive information to normal documents and sending them out via encrypted or covert channels. In the big data era, insiders are exposed to increasing amount of sensitive data, posing huge security challenges to organizations. To prevent unintentional or inadvertent data leakage, in addition to technological means, it is very important to increase user security awareness in workplace.
Many of the high‐profile data breach incidents have resulted in organizations losing hundred millions of dollars. For examples, Yahoo and Target data breaches are among the biggest in history. Yahoo announced two huge data breaches in. In the first incident, hackers compromised as many as Million user accounts in late. Later, in December, Yahoo discovered another major cyber attack, more than1 billion user accounts was compromised in August, which is believed to be separate from the first one. After the data breaches, Verizon paid $ million less than the originally planned sale price to acquire Yahoo. Between November and December,, cyber criminals breached the data security of Target Corporation, one of the nation's largest retailers. Later, it was announced by Target that personal information, including the names, addresses, phone numbers, email addresses, and financial information of up to million customers, was stolen during the data breach.
We divide technological means employed in DLPD into two categories: content‐based analysis and context‐based analysis.
• Content‐based (i.e., sensitive data scanning) approaches9, - inspect data content to protect unwanted information exposure in different states (i.e., at rest, in use, and in transit). Although content scanning can effectively protect against accidental data loss, it is likely to be bypassed by internal or external attackers such as by data obfuscation.
• Instead of trying to identify the presence of sensitive content, on the contrary, context‐based approaches7, 8, - mainly perform contextual analysis of the meta information associated with the monitored data or context surrounding the data. Some DLPD solutions are hybrid approaches that analyze both content and context.
Since the main objective of DLPD is to identify content as sensitive, content‐based methods normally achieve higher detection accuracy than pure context‐based analysis, and thus the majority research efforts in this field focus on content analysis to detect sensitive data. As shown in Figure. 4, data scanning can be deployed at different points for protecting data in different stages. Scanning data at rest that are stored in servers enables enterprises to identify potential data leak risks within the internal organization. Monitoring data in use can avoid improper handling of sensitive data and prevent them from entering the enterprise network such as by blocking such traffic when detecting an attempt of transferring sensitive data. While monitoring network data streams in transit prevents confidential data from transmitting in and leaving the corporate network.
Content‐based DLPD searches known sensitive information that resides on laptops, servers, cloud storage, or from outbound network traffic, which is largely dependent on data fingerprinting, lexical content analysis (e.g., rule‐based and regular expressions), or statistical analysis of the monitored data. In data fingerprinting, signatures (or keywords) of known sensitive content are extracted and compared with content being monitored in order to detect data leaks, where signatures can either be digests or hash values of a set of data. Shapira et al. proposed a fingerprinting method that extracts fingerprints from the core confidential content while ignoring nonrelevant (nonconfidential) parts of a document, to improve the robustness to the rephrasing of confidential content. Lexical analysis is used to find sensitive information that follows simple patterns. For example, regular expressions can be used for detecting structured data including social security numbers, credit card numbers, medical terms, and geographical information in documents. Snort, an open source network IDS, allows users to configure customized signatures and regular expression rules. Then sniffed packets in Snort will be compared against these signatures and rules to detect data leak attempts.
Statistical analysis mainly involves analyzing the frequency of shingles/n‐grams, which are typically fixed‐size sequences of contiguous bytes within a document. Another line of research includes the item weighting schemes and similarity measures in statistical analysis, where item weighting assigns different importance scores to items (i.e., n‐grams), rather than treating them equally.
Collection intersection is a commonly used statistical analysis method in detecting the presence of sensitive data. Two collections of shingles are compared and the similarity score is computed between content sequences being monitored and sensitive data sequences that are not allowed to leave enterprise networks. For instance, the3‐gram shingles of a string abcdefgh include six elements { abc, bcd, cde, def, efg, fgh}, where a sliding window is used in shingling the string. Given a content collection Cc and a sensitive data collection Cs , a detection algorithm computes the intersection rate Irate∈ [0,1], which is defined as the sum of occurrence frequencies of all items appeared in the collection intersection Cs∧Cc normalized by the size of min(|Cs | ∧ |Cc |). Figure 5 illustrates an example of calculating the similarity score of two3‐gram collections, where the sum of occurrence frequencies of items in Cs∧Cc is7, min(|Cs | ∧ |Cc |) =, and thus the Irate is0.7.
Recently, machine learning‐based solutions have emerged to enable organizations to detect increasing amounts of confidential data that require protection. For example, Symantec utilizes vector machine learning (VML) technology in detecting sensitive information from unstructured data. Through training, this approach can improve the accuracy and reliability of finding sensitive information continuously. Hart et al. presented machine learning‐based text classification algorithms to automatically distinguish sensitive or nonsensitive enterprise documents. Alneyadi et al. used statistical analysis techniques to detect confidential data semantics in evolved data that appears fuzzy in nature or has other variations.
There have been a number of studies in profiling users’ normal behaviors to identify intruders or insiders.7, 8, -, , Instead of detecting the presence of sensitive data, Mathew et al. proposed to model normal users’ data access patterns and raise an alarm when a user deviates from the normal profile, in order to mitigate insider threat in database systems. Bertino et al.7, 8 proposed to detect anomalous access patterns in relational databases with a finer granularity based on mining database traces stored in log files. Their method is able to detect role intruders in database systems, where individuals holding a specific role behave differently from the normal behaviour of the role. Senator et al. presented a set of algorithms and methods to detect malicious insider activities, and demonstrated the feasibility of detecting the weak signals characteristic of insider threats on organizations’ information systems. Costante et al. addressed the problem of identifying and reacting to insider threats in data leak detection by monitoring user activities and detecting anomalous behaviour. They presented a hybrid framework that combines signature‐based and anomaly‐based solutions. The anomaly‐based component learns a model of normal user behaviour to detect unknown and insider attacks, and then signature‐based component automatically creates anomaly signatures (e.g., patterns of malicious activities) from alerts to prevent the execution of similar activities in the future. Gyrus prevents malware from malicious activities such as manipulating a host machine to send sensitive data to outside parties, by capturing the semantics of user intent and ensuring that a system's behaviour matches the user's intent. Maloof et al. designed a system to monitor insider behaviour and activity, in order to detect malicious insiders who operate within their privileges but engaging in activity that is outside the scope of their legitimate assignments.
Watermarking is used to prevent and detect data leaks, by marking data of interest entering and leaving a network. The presence of a watermark in an outbound document indicates potential data leak. It can also be used for forensics analysis (i.e., postmortem analysis) such as identifying the leaker after an incident. In addition, trap‐based defenses are also useful for the insider threat, which can entice and trick users into revealing their malicious intentions. For example, Spitzner et al. proposed to utilize honeypots for early detection of malicious insider threats. Their method implants honeytokens with perceived value in the network. Then, these honeytokens may direct the insider to more advanced honeypots, and discern whether the insider intention was malicious or not. Papadimitriou et al. studied the problem of identifying guilty agents in the occurrence of data leakage. They proposed data allocation strategies for efficient assessing the likelihood that an agent is responsible for a leak. In addition, they consider the option of adding ‘fake’ content to the distributed data for identifying a leaker.
Signature‐based detection is the most fundamental technique used in DLPD. In many instances, fingerprint databases are created by applying standard hash functions to documents that need protection. This approach is easy to implement and has a better coverage as it is able to detect the whole confidential content. However, data fingerprinting with conventional hashing can be easily bypassed and may yield false negatives when the sensitive data is altered or modified.6 In addition, it may incur high computation cost when processing large content because it requires extensive data indexing and comparison between sensitive and normal data.
Many DLPD systems use regular expressions to perform exact and partial string matching. Regular expression‐based comparison supports wildcards and thus can capture transformed data leaks to some extent. The problem with DLPSs using regular expressions analysis is that they offer limited data protection and yield high false positive rates. Thus, they are only suitable for detecting data leaks with predictable patterns.
For unstructured textual data, collection intersection is typically used to detect sensitive information. Since collection intersection preserves local features, it can tolerate a small amount of sensitive data modifications, e.g., inserted tags, character substitution, and lightly reformatted data. However, it suffers from high computation (i.e., time consuming) and storage cost. The basic n‐gram‐based detection may generate undesirable false alarms since the comparison is order-less. Shu et al. proposed an alignment‐based solution that measures the order of n‐grams in collection intersection, which achieves more accurate detection rate than conventional string matching. To overcome the above issues, advanced content analysis such as machine learning‐based methods have been proposed. Machine learning algorithms are also used in context‐based DLPD approaches. In the era of big data, the most severe problem of content‐based DLPD approaches is scalability, i.e., they are not able to process massive content data in time.
Behaviour analysis for understanding user intention is important to mitigate the insider attack problem. Insider threat detection has attracted significant attention in recent years. A plethora of behaviour models as well as audit sources are available in the literature. However, existing behaviour analysis‐based approaches are prone to errors because of the temporal dynamics of context information, and thus leading to high false positive and low detection rates. Watermarking is vulnerable to malicious removal or distortion and may involve modification of the original data, which limit its practical application in DLPD. Honey-pots approach has its inherent drawback that the insider may not ever use or interact with the honey-pots.
Although existing DLPD techniques are effective at preventing accidental and plain‐text leaks, they are fundamentally unable to identify encrypted or obfuscated information leaks.
Challenges
While the rise of big data yields tremendous opportunities for enterprises, data leak risk inevitably arises because of the ever‐growing data volumes within corporate systems. For the same reason, data breach incidents will become more damaging to enterprises. In many cases, sensitive data are shared among various stakeholders, e.g., business partners and customers. Cloud file sharing and external collaboration with companies, which are becoming more common for today's enterprises, make the data leakage issue even worse. On the other hand, as workforce is becoming mobile, employees working from outside the organization's premises raise the potential for data leaks. In addition, in big data environments, motivations behind cyber attacks on stealing confidential enterprise data are dramatically increased with bigger payoffs and more recognition from a single attack. These factors pose a greater challenge of detecting unauthorized use, access, and disclosure of confidential enterprise data. Here, we list several technical challenges for data leak detection in the era of big data.
Customers data exposure to outside the organization will put the organization in risk. It is important to take precautionary measures to prevent leaking out data. If an employee was caught leaking the data on purpose, disciplinary action must be taken against him. When such things happen it is the responsibility of a Manger to go through all the security measures and make sure that all is well.
In an organization, it is of utmost priorirty to maintain utmost sensitivity in handling of Business Information and secrecy. It is to be ensured that critical data / business secrets do not spill out. It is also a breach of confidentiality with the clients perspective also and needs to be reviewed very seriously by leadership.
In the event that critical data is spilled out, we shall require a short term and long term measure.
Short Term:
1. Immediate asseesment of spillage point and asess the probable damage in term of reputation and reveneu.
2. Ensure that damage is contained.
2. Ensure that source of information be removed the vulnerability. However in this process, We need to be sentivitive and careful not hard team spirit and team cohesion.
3. Ensure honest team members are not vindicated.
4. Revisit roles of each team members and see for role change for each one. This process shall be refreshing for the team also as it shall revitalise the team with fresh energy.
Long term:
1. Make a revisit of data handling and acessibility.
2. Put in a system where to Set authorization procedure based on criticality & data sentivity.
3. Overall Team communication, delegation and feedback system / procedure needs to be revisisted because most of the time, this is the core issue for such spillage.
if Org. activity is known to everyone, thus the customer's data is containing only his account age analysis which is not so important I think ... so neglecting this issue is better than expose it.
Thanks for invitation,
!- I will carefully investigate the situation, asking the legal dept. to indicate:
- How this actually happened ?
- Who is / are responsible ?
- Their recommendation, to punish the responsible people and convey these recommendations to HR people.
2- Asking the IT people to review the data security system to avoid thsi in the future.
Thanks
I support my colleague Celeste's answer
Regards
first i will quickly investigate why there is a data exposure happened, i will ensure the consistency of customers file protection so that it will not happen again to other customers. exposure is already happened so all i can do is to prevent the re occurence.
Give more attention on reasons, problems, why and how it happened and resolve these from the route. At last communicate with the person whose data was exposed and explain the issue and confirm that this will not hapen again.
Will simply get proof. WIll try to recall data. Will surely fire that person and will go for legal action to avoid such thing again.