By James Ruotolo, senior manager of SAS fraud prevention solutions
A recent report developed by the Coalition Against Insurance Fraud showed that insurers are increasingly looking to predictive analytics and fraud prevention technologies. Evidence of this is provided by the fact that 95% of respondents claim to make use of anti-fraud technologies, an increase from 88% since 2012. However, insurance representatives highlighted 'integration' and 'poor data quality' as the biggest difficulties in implementation of technologies to combat illicit acts.
Lots of data, but what to really analyze?
Insurers generate large volumes of data daily. However, one of the biggest challenges they face in implementing Analytics is accessing the right sources of information. Efforts are hampered by multiple claims, systems segregated by lines of business, management of different in-house developed systems, and third-party systems that store critical data such as billing information. There is even useful unstructured data, such as claims notes. In addition, there are new sources of information, such as social media and telematic devices.
Consolidating this data can be complex, but the use of robust Data Quality and Data Integration tools can help a lot in this process. Having the concern with the quality and integration of data is fundamental to the production of a successful model.
Some fraud detection systems do not take the issue of data quality into account. In this way, good customers can be annoyed with false positives, missed opportunities and misguided alerts. The quality of fraud analytics solutions directly depends on the quality of the input data.
Four key steps in preparing data for fraud analysis:
1.Integration. Even if the data is generated in different places, such as the police department, hospitals, financial institutions, companies, etc., they need to be integrated for the analysis of possible fraud. During this step, it is critical to document integration efforts and ensure they are repeatable and auditable. This will be essential when the Fraud Analysis Score is put into production.
2. Missing or erroneous data. Does your system contain individuals with invalid CPF or RG numbers? Was a complaints file found without a phone number? If these errors are ignored, they can have a negative impact on fraud analysis results. Tools like Data Quality can help you identify, repair, and replace data that is missing or erroneous in your system. During this phase, it is also useful to standardize formats for common fields such as addresses.
3. Decipher information. Once data is aggregated across more systems, it is important to identify whether the same individuals, companies, and other organizations exist in multiple locations. One system might identify name and security code, while another might identify name and date of birth. Simple techniques, present in organizations, can be used to link this data and identify them as the same individual, but the best results are found when Advanced Analytics technologies are used to determine the probability of a hit.
4. Process unstructured texts. More than 80% of insurance data is stored in text format. Some of the best information about claims files is captured in the loss description or claim note fields. But dealing with data is not so simple. Abbreviations, acronyms, industry jargon and misspellings are common and need to be identified by text solutions that contain a vocabulary specially designed for insurance data. During text analysis, additional models of variables can be created. This is a powerful way to expand the reach of fraud analysis without having to include external data sources. Machine Learning and Natural-Language Processing techniques should be used to find and create useful variables for fraud analysis models.
Effective data management is essential for any Fraud Analytics implementation. The investment made in the data cleaning process will result in better fraud detection rates.