By Michael Cochrum
Suppose you or your significant other go to the local haberdashery looking for a new suit. Instead of the assistant taking waist, hip, neck and chest measurements, they simply ask for weight. How likely do you think it is that the correct fit would be selected based only on that information? It’s not very likely. While weight is a measurement of size, it is not very helpful in measuring fit. Among financial institutions, we may be guilty of the same misconceptions, and susceptible to the same error, related to our Key Performance Indicators (KPI’s). Using the wrong measurement can be all the difference between making sound business decisions and making mistakes that are only discovered after it is much too late.
One key measure that is used in financial institutions is delinquency rate, or the percentage of account balances that have payments that are sixty days or more past due. Because delinquency can be a precursor to default, loan portfolios with relatively high delinquency ratios are thought to be of low quality and those with low delinquency ratios are thought to be of high quality. However, delinquency can also be an indicator of stress on borrowers. For example, as unemployment rises, loan delinquency tends to rise. Delinquency can also indicate that a financial institution is not managing risk well. Two financial institutions with relatively similar risk distribution can have widely different delinquency ratios due to the way they engage borrowers under stress. But, should delinquency be used to evaluate the performance of lending employees? Is it an appropriate measure to measure how well an underwriter or collector is doing their job?
I write about the fallacy of bad measurements in my book, The Data-Driven Credit Union. I tell my own story of being measured as collector based on the number of delinquent accounts I was able to resolve in a thirty-day period relative to my counterparts. For months, I was ranked the worst collector in our local office and had to endure several meetings of criticism. It was demoralizing. Then, one day, someone from our headquarters office visited our local office with a whole different set of measurements. In her assessment, not only was I the most effective collector in the local office, but throughout the entire company. That is quite a swing, from the worst to the best in a matter of minutes. The difference was based upon more than a relative quantitative measure comparing my ability to resolve accounts against my in-office counterparts. The new measurements considered qualitative factors related to the probability of being able to collect the account in the first place. In other words, the corporate office was measuring the probability of resolving accounts similar to those I was assigned, and then evaluating my success based on the relative probability of being successful, taking into account the quality of the borrower in the first place.
At a recent conference I was attending, a lender spoke up in a breakout session and said they provide incentives to underwriters based on the delinquency ratio of their portfolio. I found this interesting because I immediately began to recall my previous career experience. I would imagine that the underwriters at this financial institution have probably questioned the fairness of this measurement. I would also suggest that that this strategy could create a disincentive for underwriters. Unless the financial institution can ensure that the available applications to underwrite are completely random, there is no way that delinquency is a fair measurement of underwriting accuracy. I wonder if the Chief Lending Officer would want their incentive to be based on a comparison of their financial institution’s delinquency ratio with other financial institution’s delinquency ratios without taking into account the risk distribution of the portfolios? The trouble is, using just a delinquency ratio without also considering the probability of a borrower defaulting in the first place, is an inaccurate measurement of underwriting skill. For example, let’s say that, as an underwriter, I’m completely comfortable underwriting loans across the entire risk spectrum and other underwriters are not. Perhaps my counterparts tend to ‘skip’ applications with lower credit scores, or perhaps they just decline them. Do these underwriters have the best interest of the financial institution and borrower in mind? Should they receive incentives for that?
On the other hand, let’s say that the incentive measure used is the relative difference in the actual performance of a loan and it predicted performance based on the credit score and other factors. If there is a 2% probability of default across my entire portfolio and the loans I originated have a 2.2% default rate, then the loans I originated are performing as expected. If my portfolio has a 6% default rate, then I’m probably making poor underwriting decisions. If my portfolio has a 0.5% default rate, then I’m probably too conservative. My counterparts could have entirely different targets based on the loans they have underwritten. Measuring performance in this way achieves the results that we want, accurate and conscientious underwriting. The prior measurement encourages manipulation and counter-productive strategies.
As a Data Scientist, it is important to me that measurements support their intended purpose and are never manipulated to support a personal bias. I’ve seen people lose their jobs and institutions fail to meet their potential due to measurement employed poorly. Too often, we are tempted to use or misuse the measurements that are readily available. It is more important that the measurements we use are appropriate for what is being measured, even if they have to be constructed. Otherwise, we will find our decisions will not have a proper fit for the question at hand.