This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.
You asked: How can analytics be used for Telematics? What are the challenges and opportunities?
The goal of Telematics is to integrate the use of telecommunications and informatics to monitor real driving behavior. A black box device is installed in the car for this purpose. Examples of Telematics data that are recorded are:
- distance driven;
- time of day;
- number of trips per day;
- harsh or smooth breaking;
- aggressive acceleration or deceleration;
- cornering and parking skills
Analyzing this data allows for a better risk assessment and personalized insurance premiums based on actual driving characteristics.
Predictive analytics such as linear regression, logistic regression and decision trees can be used to estimate the frequency and severity of claims based on self-reported risk data (e.g., demographics data, vehicle data) combined with Telematics data. The performance of these models can be evaluated using metrics such as the Pearson correlation coefficient and the R-squared (both for regression), and the area under the ROC curve and top-decile lift (both for classification). Based upon the size of the data set, these metrics can be calculated using a training/test sample split-up (more than 1000 observations), or a cross-validation split-up (less than 1000 observations). Also the interpretability of the models is an important evaluation criterion. Hence, it is important to, e.g., verify the signs of the regression coefficients and the splits and leave nodes of the decision tree.
Descriptive analytics such as hierarchical clustering methods or k-means clustering can be used to cluster drivers based on driving style. The resulting clusters can then be evaluated in terms of their separation and interpretability. An interesting follow-up step would be to build a decision tree to characterize each of the resulting clusters so as to better understand their identifying characteristics.
Various activities need to be undertaken during the post-processing of the analytical models such as:
- interpretation and validation of the analytical models by the business experts;
- sensitivity analysis;
- representing the models in a user-friendly way, so that they can be easily interpreted and used;
- model monitoring using backesting procedures.
Various opportunities and challenges remain. An interesting opportunity would be to combine the Telematics data with external data such as of road maps, weather data and traffic data. This would allow to investigate complex interactions between driving behavior and the various data elements gathered. An importance challenge concerns legal regulations and privacy which are changing on a continuous basis (e.g., the GDPR regulation). The EU Gender Directive bans price differentiation based on gender (since 2012). Note however that empirical analysis has shown that gender plays no role anymore in models incorporating telematics data . Another key challenge when analyzing Telematics data is the discontinuous stream of sensor data which poses various analytical modeling challenges
: Verbelen R., Antonio K., Claeskens G., Unraveling the predictive power of telematics data in car insurance pricing, Working Paper KBI_1624, Faculty of Economics and Business, KU Leuven, 2017.