The seemingly inconsequential disclosure of a phone number or ZIP code to a store clerk can ultimately end up far away from where it was first shared, especially if it is used for data mining purposes. Data mining is the use of computer-based analytic tools that sift through large collections of data searching for patterns based on statistical techniques. Often times, data records containing personal identifiers are compiled from many sources and transferred to third parties for data analysis.
The information collected and stored in large databases can be used to detect suspicious spending patterns or to uncover improper spending of federal relief funds. Often, the results of the analysis lead to the detection of overall trends or patterns that reveal unusual activity and other specific parameters. While some data mining techniques are used to help with national security, others are in place to help combat financial fraud.
Federal agencies
The Federal Agency Data Mining Reporting Act requires federal agencies to submit reports periodically to Congress informing them of their data mining activities. For instance, two bureaus of the Department of the Treasury regularly engage in data mining activities: the Internal Revenue Service (IRS) and the Financial Crimes and Enforcement Network (FinCEN). The IRS mines financial data to predict which individual tax returns have the greatest potential for fraud and which corporations are most likely to make improper use of tax shelters. FinCEN focuses its data mining on money laundering activities and other financial crimes.
Both agencies use similar data mining technologies that include a database that reviews aggregate Bank Secrecy Act (BSA) forms and information. However, because BSA reports—such as the Suspicious Activity Report and Currency Transaction Reports—do not on their own reveal potential underlying criminal activity, FinCEN, for instance, may also query other law enforcement databases for further data on suspicious trends or patterns indicative of anomalous or illicit activities.
Data mining limitations
While data mining can reveal helpful patterns and trends, it has inherent limitations. For example, data mining cannot identify the underlying cause of the identified patterns and trends. The user must determine the significance of the data collected and must be able to draw relevant and accurate inferences.
A significant drawback to using commercial data is the possibility that the data contain errors or is of poor quality—it may be duplicative, for example, or dated. The accuracy, timeliness, and completeness of the data and analysis of the data are important. Drawing erroneous or adverse inferences about any individual can quickly become problematic. According to the Treasury's data mining report, FinCEN uses checks and balances in its data mining and analysis to ensure that the data is used only by authorized agencies and for statutorily authorized purposes.
Interpreting the data
Large aggregated collections of information are valuable intelligence resources. It is important to understand how and why access to such information is valuable. Sophisticated information retrieval techniques such as data mining allow users to search extremely large collections of data for trends and patterns and to zero in on particular transactions of interest. The information collected can also help law enforcement agencies identify emerging financial criminal trends. However, it is prudent to keep in mind that the initial data gathered many times only serves as lead information, and it may not be that until further analytical and investigative steps are taken that the information can ultimately work to help catch financial wrongdoers.
By Ana Cavazos-Wright, senior payments risk analyst in the Retail Payments Risk Forum at the Atlanta Fed