Over the next several weeks, Arturo will be sharing a three-part series that deep dives into the strength of our labeling process and how our team, technology, and processes separate us from the competition and enable our on-demand processing, resulting in the most accurate and up-to-date property information available to property and casualty insurance carriers.
The data labeling journey at Arturo touches many hands, from the machine learning team to the data quality team. But the devil is in the details, according to Principal Machine Learning Engineer Brad Sheneman. “It’s a time-sensitive process with several iterations of labeling,” he said. “We take the time to do it all as opposed to just grabbing 100,000 images and doing the best we can. We want to make sure the data is high quality and modelable.”
In this post, Brad, along with Applied Machine Learning Engineer Sardhendu Mishra, goes into detail about the training data and machine learning tools that Arturo uses to improve the overall labeling process. Arturo’s process is another contributing factor to becoming the leaders in property analytics technology for property and casualty insurers.
Fast-than-average turnaround time
All the tooling is designed to train the model throughout the labeling process—a process that used to take months is now trimmed down to weeks as a result. “We find what data we need to evaluate to improve our models,” Brad said, adding this is where the human level and expertise at Arturo come in. “It’s a cyclical process that becomes more automated to train models in much less time.” The combination of pipelines, tooling, and deployment infrastructure makes models operate a lot faster.
Highest level of expertise
“A lot of companies that do machine learning find it hard to get the right data for them; others get a lot of open-source data,” Brad said, pointing to the domain expertise on Arturo’s data science team. “We’re not treating it as purely a deep-learning problem. We’re taking the time to figure out what the solution is.” As an example, Sardhendu added on top of deep learning to actually solve the problem, the team tries other technical methods, like using other data sources, prior-known knowledge or in-country domain experts who understand the relationships between imagery features at a deeper level.
“We train our models over and over again so the customer can see the best model that we have now,” Brad said. Arturo puts the trust and experience of its staff to the test when it comes to “the framework and knowledge built up around machine learning and satellite imagery,” he said, noting plans to double the team in the next six months. “The whole team has internalized a lot of these patterns we’ve used, knows where to turn, and how to contribute.”
“Getting high-quality data from our images is the most important part of the process; it’s the single most important thing we do that all our efforts are focused on,” Brad said. “It’s our biggest differentiator, and it all comes down to the interaction between us and the data quality team.”
To learn more about how Arturo’s training data and machine learning technology is brought to life in solutions that drive real business impact for insurance carriers, please request a demo.