This is FiveMinuteFriday, episode number 248, Data Science in Government.
Welcome back to the SuperDataScience Podcast ladies and gentlemen, today we are continuing our little saga of exploring data science applications in different industries and the industry for today is government.
So how can data science be used in different areas in the government? As usual, we’re going to look at only five applications. There’s plenty more, but hopefully this will help inspire you about this field or it will give you some ideas of how you can use data science in your field. And off we go.
Number one, application number one of data science in government is tax fraud detection. Not necessarily, it’s the biggest application, but number one that we’re talking about today, tax fraud detection. So tax evasion and fraud are huge burdens to governments all around the world. For example, in the US alone, this cost is about $400 billion per year. That’s almost half of a trillion dollars. So $400 billion every year is the cost that US bears on because of financial or tax evasion and fraud. So how can data science help here? Well machine learning algorithms are increasingly being used to find and identify fraudulent, tax fraud and tax evasion.
So an example is that the IRS uses clustering models to put all tax returns into groups that have similarities and then identify returns of pulling outside these clusters as outliers that require additional investigation. Very, very cool application if you think about it because clustering, unlike classification, clustering is an unsupervised type of machine learning. It’s a branch of unsupervised machine learning algorithms and therefore you don’t really need to know anything in advance. You just like, you have these input fields which should be plenty of in tax returns and then you cluster them. And so anything that falls outside those clusters is really suspicious, it’s not, it’s not similar to any, it’s not part of any group with similar attributes. Very, very cool approach. And that’s as you can imagine, will reduce the amount of work. Instead of going through all the tax returns, the IRS could just like focus on those specific ones and start there.
Another very cool, application or way of looking through tax returns with data science is Benford’s law. We won’t go into detail now, but it allows to also spot fake tax returns. The thing is when somebody fakes the tax return, or fakes numbers in a balance sheet or a profit and loss statement, those numbers are really hard to fake. There’s a distribution of numbers that appear in a, in a normal standard tax return or standard financial statements. Like for instance, we’re specifically talking about, the first number. So number one appears more often. Number two, at the start of, the digit two appears less frequently at the start of a number. Digit three appears even less frequently and so on. So look it up if you’re interested. It’s called Benford’s law. Won’t go into detail here, but there are certain statistical, data science, machine learning methods that allow governments to detect tax fraud and tax evasion and things that if you are not aware of them, you will never even think about them. So they’re very, very effective.
Number two, public transport maintenance. Right now, about 55% of the global population lives in cities and that number is expected to go up to 68% by 2050. So, over right now it’s about half. By 2050, it will be over two thirds of the global population will be living in city. As you can imagine that’s a massive strain on the infrastructure and specifically on public transportation. And that causes, that’ll cause more failures, that will cause more stress on the system and that therefore it needs more maintenance. So what we can do is we can use data science, machine learning, computer vision, and so on to actually track where this maintenance is going to be required. Here’s an example. The London Underground is one of the busiest in the world, so you’ve probably heard of the London tube.
So that underground system actually has 1.4 billion journeys taken on it every single year. 1.4 billion journeys, crazy number. However, it is Victorian era infrastructure, is aging, and the carriages are aging and all needs regular maintenance. And actually more than half of the delays that are happening, that happened in London in the tube actually are caused by malfunction. And so what has happened recently is they’ve, in London they’ve implemented a failure probability model based on heat maps to predict where breakdowns are likely to happen. And they’re hoping that this will significantly reduce the cost of repair, which currently take up almost 60% of their budget. So it’s kind of like a predictive maintenance application of data science and machine learning in public transportation.
Use case number three of data science, machine learning, and AI in government. And this is user assistance. So if you’ve ever needed to apply for your driver’s licence or driver’s license renewal or you needed to fill in some other form for the government, you probably experienced that this can be a complex, stressful lengthy process and it can actually lead to additional problems than there are. For example, in a 2018 survey, it was found that 40% of tax payers had made at least one mistake when filling out forms in the previous 24 months. And similar percentages answered that they weren’t sure they were paying the right amount of tax and almost 70% of those asked said they would like an AI assistant to help them with forms, showing an increasingly positive reception of these types of services. So basically the idea is to use clever artificial intelligence that can improve user experience and make the process quicker, more personalized, and ultimately more understandable to the average user. And so that these forms can be filled out more efficiently and actually correct it.
Use case number four of data science, machine learning, AI in government is crime detection and prevention. Crime is extremely costly both to, in terms of damage to people and property. And also in terms of law enforcement. For instance, the US alone spends over a hundred billion dollars a year on law enforcement. We’re definitely still quite a way away from things like Robocop, but nevertheless we can actually already use machine learning advanced algorithms to assist with crime prevention and prosecution and law enforcement and things like that. So here’s an example, ShotSpotter is a tool that governments can use and what does it is uses acoustic sensors to detect and triangulate gunshots in real time. And this allows the law enforcement agents to approach the exact location of the shooting quickly and safely.
And ShotSpotter is actually used by quite a lot of cities around the world, including New York, Chicago, San Diego, San Antonio, and so on. So very interesting system that, and actually it’s estimated that about 80% of gunshot incidents are never reported to 911. So therefore, if a city has a system like that, then most of the gunshots can be addressed by the police even if they’re not reported, and that can actually help save lives and patch more criminal. All right.
Use case number five of data science and machine learning in government. And this is traffic path planning. It used to be very difficult to get any sort of traffic data and it was mostly obtained by random samplings from volunteers on street corners who would count cars and bicycles when they went past. And this meant that a lot of traffic planning was based on gut feel and that resulted in longer travel times for everybody. Also delays, traffic jams, and things like that. So bottlenecks and stuff like that. However, the rise of smartphones and apps that track movement, governments have now a massive amount of data to analyze and base their future decisions on. And here’s an example. So Strava, by the way, if you haven’t used this app, highly, highly recommend. I discovered it for a bicycle ride that I was doing with my dad from Gold Coast to Brisbane and back. Amazing app. By the way, if you’re on there, find me there and connect, connect. I think if you search Kirill Eremenko you probably should find me. So Strava, s t r a v a, is a really cool app that you can track your sports activities. For instance, your bicycle riding, your jogging, your walking, your swimming and other things like that.
For instance with your jogging or hiking, you just put in your pocket and it tracks where you going and you can share your results to people. It’s really a cool app. I highly enjoy it. By the way, no affiliation with them. Just something that I really like. So Strava is an app that allows users to track their jogging or cycling paths and compare them to other people. They have a wealth of data and in 2008 they published a public interactive heatmap based on over 13 trillion data points. This has allowed cities like Glasgow, Stockholm and Brisbane, Brisbane or fair Brisbane, Australia, to calculate return on investments on expenses like cycle lanes. And figure out where to create new ones and what will best serve the community. Very, very cool app. And I, you know, through this ride that we’re doing, you can see how much the government spends on these bicycle lanes.
So we were riding and there’s like a, usually as a sign like this bicycle lane was constructed in this year and this is how much government spent. It ranges from like $8 million to $43 million for just a few kilometers of bicycle lane. So as you can imagine, or not even a bicycle lane, like bicycle path somewhere off the highway. As you can imagine, that’s a massive cost for the government and data driven insights allow it to put that money to use effectively and actually build things that are useful for the community. And that’s just one of the examples of how that’s done.
So there we go. That’s data science in government, the five use cases. I hope you enjoyed this and hopefully that I gave you some inspiration. As you can see, government is actually quite an interesting space to be in. If you’re using, if you have data science built in, you can help governments build their cities, or fight crime or to prevent [inaudible] and so on. So there you go. Five use cases of data science in government, and I look forward seeing you back here next time. Until then, happy analyzing.