Medicare is sporadically compromised by fraudulent insurance claims. These illegal activities often go undetected, allowing full-time criminals and unscrupulous healthcare providers to exploit weaknesses in the system. According to the National Health Care Fraud Prevention Association, the estimated annual fraud amount exceeded $100 billion last year, but is likely much higher.
Traditionally, detecting Medicare fraud has left a limited number of auditors or investigators responsible for manually examining thousands of claims, looking for very specific patterns that indicate suspicious activity. There is only enough time. Additionally, there aren’t enough investigators to track down the various Medicare fraud schemes.
Leveraging big data, such as patient records and provider payments, is often considered the best way to create effective machine learning models to detect fraud. However, handling unbalanced big data and high dimensionality (data with a surprisingly large number of features, making computation extremely difficult) remains a major challenge in the field of Medicare fraud detection.
A new study from Florida Atlantic University’s School of Engineering and Computer Science addresses this challenge by pinpointing fraud in the “vast ocean” of large Medicare data. Because identifying fraud is the first step in stopping it, this new technology has the potential to save the Medicare system significant resources.
For this study, researchers systematically tested two large unbalanced Medicare datasets: Part B and Part D. Part B includes Medicare coverage of medical services such as doctor visits, outpatient care, and other medical services not covered by hospitalization. Part D, on the other hand, is related to Medicare’s prescription drug benefit and covers drug costs. These datasets were marked with a List of Excluded Individuals and Entities (LEIE). LEIE is provided by the U.S. Office of Inspector General.
Researchers delved into the impact of random undersampling (RUS), a simple yet powerful data sampling technique, and its novel ensemble supervised feature selection technique. RUS works by randomly removing samples from the majority class until a certain balance between minority and majority classes is met.
The experimental design investigated a variety of scenarios, from using each technique alone to using them in combination. After analyzing each individual scenario, the researchers again selected the method that yielded the best results and performed an analysis of the results across all scenarios.
The research results are big data journal, We demonstrate that intelligent data reduction techniques can improve the classification of large, unbalanced Medicare data. Applying both techniques (RUS and supervised feature selection) synergistically outperformed models that utilized all available features and data. Our findings show that the best performance is obtained with the combination of feature selection technique followed by her RUS or RUS followed by feature selection technique.
As a result, the researchers found that the method with the greatest amount of data reduction yielded the best performance in classifying both datasets. This is a technique that performs feature selection and then applies RUS. Reducing the number of features results in a more explainable model and significantly improves performance than using all features.
The performance of a classifier or algorithm can depend on multiple influences. Two factors that make data classification even more difficult are dimensionality and class imbalance. Class imbalance in labeled data occurs when the overwhelming majority of instances in a dataset have one particular label. Classifiers optimized for metrics such as accuracy are hampered by this imbalance, as they can incorrectly label fraudulent activity as non-fraudulent, boosting the overall score on the metric. . ”
Dr. Taghi Khoshgoftaar, Senior Author, Motorola Professor, FAU Department of Electrical Engineering and Computer Science
For feature selection, researchers incorporated a supervised feature selection method based on feature ranking lists. These lists were then combined to create a final feature ranking through the implementation of an innovative approach. A model was also built utilizing all features of the dataset to provide a benchmark. In deriving this unified ranking, features were selected based on their position in the list.
“Our systematic approach provides a deeper understanding of the interplay between feature selection and model robustness within the context of multiple learning algorithms,” said first author and Ph.D. said John T. Hancock. student in FAU’s Department of Electrical Engineering and Computer Science. “Building a model with fewer features makes it easier to reason about how the model performs classification.”
For both Medicare Part B and Part D datasets, researchers conducted experiments in five scenarios to thoroughly explore possible ways to utilize or omit RUS and feature selection data reduction techniques. For both datasets, the researchers found that data reduction techniques improved classification results.
“Given the significant economic impact of Medicare fraud, this important finding not only provides a computational advantage but also significantly increases the effectiveness of fraud detection systems,” FAU Engineering – said Dr. Stella Batalama, Dean of the Department of Computer Science. . “If these methods are properly applied to detect and deter Medicare insurance fraud, they have the potential to reduce the costs associated with fraud and significantly improve the standard of health care services.”
Co-authors of the study are Dr. Huanjing Wang, a professor of computer science at Western Kentucky University; and his Ph.D. student Qianxin Liang. student in FAU’s Department of Electrical Engineering and Computer Science.
sauce:
florida atlantic university
Reference magazines:
Hancock, J.T. other. (2024). Data reduction techniques for highly imbalanced medical big data. big data journal. doi.org/10.1186/s40537-023-00869-3.