Podcasts SDS 591: Simulations and Synthetic Data for Machine Learning

75 minutes
Data Science, Machine Learning

SDS 591: Simulations and Synthetic Data for Machine Learning

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Mars Buttfield-Addison, PhD Candidate at the University of Tasmania, joins Jon for a high-energy episode that sees them discuss her new book ‘Practical Simulations for Machine Learning,’ the mobile operating language Swift and space junk!

About Mars Buttfield-Addison

Mars Buttfield-Addison is a computer science and machine learning researcher, tech freelancer and educator. She is currently doing a PhD in Computer Engineering at the University of Tasmania and CSIRO, where she works to improve satellite- and space debris-tracking radars. On the side, Mars teaches programming, data science and computer science; runs developer conferences; freelances as a developer of smart systems and creator of STEM educational materials; and writes books about machine learning. Her new book “Practical Simulations for Machine Learning” looks at how the popular video game development tool Unity can be leveraged for training serious neural networks.

Overview

First, Jon and Mars dive into her new book ‘Practical Simulations for Machine Learning.’ Inside the newly published O’Reilly book, you’ll find Mars and her co-authors walking you through the basics of simulations and synthesis, synthetic data, and practical applications of simulations.

For those unfamiliar with simulations, Mars provides a quick primer to get you up to speed: “we’re using simulations to mean when we want to reinforcement learn something, teach it to perform some certain task…whereas synthesis is where we want to make some dataset that we then take out of engine and then apply using traditional machine learning models,” Mars explains.

Surprisingly, synthetic data derived from simulations can provide us with infinite quantities of potentially very high-quality data for training machine learning models. Moreover, the simulations in the book rely heavily on the game engine Unity. And as Mars explains, its bots can be used to solve any problem expressed spatially, which could be any computational problem.

The relationship between Unity and learning algorithms is helpful for training because it provides a lower level of abstraction. Still, you can also easily spot when your machine learning algorithm has “gone off the rails.”

Mars is currently pursuing a Ph.D. in computer engineering at the University of Tasmania, where she focuses on writing high-performance computing software to track space junk.

Rather than focus on machine learning during her Ph.D., Mars’s work leverages her background to focus on high-performance computing tasks related to space surveillance. For example, to predict the forces at work in space, she reframes algorithms to account for Earth rotation and Doppler Shift. This is all done through GPU programming with CUDA, which is essential to the high-performance computing needed for tracking space objects with radio telescopes.

Tune in to hear more from Mars, including what it’s like creating video games in a “secret” Tasmanian games lab, and whether she thinks programming or statistical skills are more important in data science.

In this episode you will learn:

What simulations and synthetic data are, and why they can be invaluable for real-life applications [5:47]
How simulated bots can solve any problem [9:07]
Practical uses of simulated data [21:49]
Why the mobile operating system language Swift is interesting for A.I. [25:46]
Why it’s critical to track the amount of junk in space [35:47]
Whether programming or statistical skills are more important in data science [47:05]
What it’s like creating video games in a “secret” games lab [56:45]
Why you might want to do a data science internship in industry before pursuing in academia [ 1:01:54]

Items mentioned in this podcast:

Practical Simulations for Machine Learning by Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning
Practical Artificial Intelligence with Swift by Mars Geldard, Jonathon Manning, Paris Buttfield-Addison, Tim Nugent
Unity
Unity ML-Agents
SuperDataScience
Swift Programming Language
Prolog Programming Language
“Radio Astronomy for Programmers” – Mars Buttfield-Addison YouTube talk
“Python versus Space Junk” – Mars Buttfield-Addison YouTube talk
The Secret Lab
“Games Engines and Machine Learning” – Mars Buttfield-Addison YouTube talk
University of Tasmania
O’Reilly Online Learning Platform
Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Jeffrey D. Ullman, Ravi Sethi
Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman
Secrets and Lies by Bruce Schneier
The Pattern on the Stone by W. Daniel Hills

Follow Mars:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon: 00:00:00

This is episode number 591 with Mars Buttfield-Addison, PhD candidate at the University of Tasmania.

Welcome to the SuperDataScience podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now, let’s make the complex simple.

Welcome back to the SuperDataScience podcast where I’m joined today by the bewilderingly interesting and widely knowledgeable Mars Buttfield-Addison. Mars is co-author of two O’Reilly books, the newly released Practical Simulations For Machine Learning and Practical AI with Swift, which was released three years ago. Mars is pursuing a PhD in computer engineering from the University of Tasmania in Australia, in which she’s focused on writing high-performance computing software to track spacecraft and space junk.

She teaches courses on AI and data science at the University of Tasmania. She’s a regular speaker at top tech conferences around the world, and she holds a bachelor’s degree in software development and data modeling.
Today’s episode should be equally fascinating to technical and non-technical folks alike. In this episode, Mars details what simulations and synthetic data are and why they can be invaluable for real-life applications. She talks about how simulated bots can solve any problem by representing the problem as a 3D visualization, why the mobile operating system language Swift is interesting for AI, how much junk there is in space and why it’s critical we track it, what it’s like creating video games in a secret Tasmanian games lab, whether programming or statistical skills are more important in data science, and why you might want to do a data science internship in industry if you’re thinking of having a career in academia. All right, you ready for this stimulating episode? Let’s go.

Mars, bringer of war, welcome to the podcast episode. Where in the world are you calling in from?

Mars: 00:02:30

I’m calling in from Hobart, Tasmania, which is like a little heart-shaped island state that hangs off the bottom of Australia, way down near Antarctica.

Jon: 00:02:40

And I understand, as we are filming in June, that for the Northern hemisphere, this is summer for you. It is something quite different.

Mars: 00:02:50

It’s the dark of winter. Actually, the city that I’m calling you from is famous for its mid-winter festivals, where they build big effigies and burn them and harvest all the apple trees and make warm cider. The city is filled with these like red inverted crosses.

Jon: 00:03:09

What?

Mars: 00:03:09

And every business turn their lights red. Yeah, it’s this big thing. The whole island turns creepy in the dead of winter.

Jon: 00:03:16

Inverted crosses?

Mars: 00:03:18

People come from all over the world to come and see a creepy festival.

Jon: 00:03:23

Well, that sounds like fun. And it’s chilly, I guess. The skiing, can you ski on Tasmania?

Mars: 00:03:28

No.

Jon: 00:03:29

No.

Mars: 00:03:30

We get this weird thing where the planet is the bit smashed north where we think we are quite south. We’re one of the southernmost significant land masses in the world. It’s like us and Argentina.

Jon: 00:03:45

Yeah.

Mars: 00:03:45

But we still aren’t that far south. We’re only 42 degrees south.

Jon: 00:03:49

Right.

Mars: 00:03:50

We get snow up on our mountains, but we don’t snow in the city.

Jon: 00:03:54

Right. Not good enough weather.

Mars: 00:03:56

So extreme event if you can get your skis out here.

Jon: 00:03:57

Ah, lots of inverted crosses, but no snow.

Mars: 00:04:00

Yes.

Jon: 00:04:02

Well, I would take that trade. So we know each other through Suzanne Huston of O’Reilly. Suzanne has been kind enough to introduce me to several great O’Reilly authors such as yourself, and you are the first of those that she’s introduced me to that is on the show. You have a book that is being released… I mean, digitally, it was released.

Mars: 00:04:27

Right now.

Jon: 00:04:28

Yeah, at the time of recording. And it seems like you’re going to be able to get the book on Amazon in the United States at least as of today, the time that the episode is being released. So that’s super cool. Your book is called Practical Simulations for Machine Learning, and you co-authored it with Paris Buttfield-Addison who has the same last name as you by bizarre coincidence. Wait, no, no, no, he is your husband. And also Tim Nugent and Jon Manning. This is a super cool book. I loved going through it. I went through the pre-print to study for filming this episode, and I loved it.

Mars: 00:05:14

Podcast homework.

Jon: 00:05:15

Podcast homework. Yeah. Most of my homework these days is podcast homework. So it has three big sections. The first is on the basics of simulation and synthesis. And then the second big part is on simulating worlds for fun and profit, which is a really fun title for a part of the book. And then the third and final part is on synthetic data and real results. So it seems like, from these parts and from the table of contents, that there are two kinds of things that we’re covering in this book. There are simulations and then there are synthetic data. What is the differences between those things? Do you use simulations to create synthetic data? Fill us in.

Mars: 00:06:04

So in this case, we’re using simulation to mean and when we want to reinforcement learn something, teach it to perform some certain task, to read some certain goal state, to animate when we haven’t given a specific animation, whereas synthesis is where we want to make some data set that we then take out of engine and then apply using traditional machine learning models. So say that you’re trying to make a model in a domain where you don’t have enough data. It turns out that even if you create synthetic data based on the data you have, and even though you might intuit, that means it can only be derivative of what you have. Why would that make it any better? It turns out it does. And all of a sudden you have an infinite source of data.

Jon: 00:06:44

Wow, that is wild. I actually had a conversation many, many years ago, a phone call with this consulting company. They were talking to me about simulating financial data. And I thought it was a really bad idea. I should go back and phone them. I think I was a bit snippy with them. I actually often regret that phone call because I was quite dismissive. And yeah, it sounds like from what you just said, I made a big mistake, that in fact, synthetic data can be hugely valuable. They can provide you with maybe even better data than you could possibly collect in a particular situation.

Mars: 00:07:30

Well, if you only have limited input data, you’re always going to be limited in what you can do, and there’s only so far that you can stretch it. Obviously, anything you simulate, anything you synthesize is going to be just reinforcing whatever’s there.

Jon: 00:07:42

Right.

Mars: 00:07:42

So if you have something that’s really messy or really ill-defined or something, it’s going to learn the wrong lesson. As we all know, machine learning is hard to wrangle. Sometimes it learns the wrong thing. But if you are smart about how you use it and you’re smart about the random variations that you introduce, that source of chaos, then you can make something that’s really robust.

Jon: 00:08:01

That’s cool. Yeah. And as you say, infinite amounts of it then, if you can simulate your synthetic data, then you can theoretically have as much as you need. So this is something, for example, that we see with self-driving cars. There’s a lot of self-driving car footage out there of really boring situations. So going straight down a highway. Self-driving cars collect lots and lots of data to show that, but there isn’t really anything that needs to be learned, particularly in that case. What we want to be learning is the irregular circumstances, when a pedestrian jumps out in front of the road or a cow jumps in front of the road, or you’re in a city with all kinds of things going on. So a lot of self-driving car companies simulate data of the edge cases so that you have lots of examples of training data to train self-driving cars on.

Mars: 00:08:55

That’s a great idea.

Jon: 00:09:00

Yeah. That’s one that I was kind of aware of before going through your book. So in your book, you guys rely heavily on something called Unity. So what is Unity? Why was that so useful for the simulations that you did in your book?

Mars: 00:09:18

Well, Unity is actually a game development engine. So if you think about when you want to make video games, and quite often they’re complex 3D worlds nowadays, you don’t want every time you’re writing a game to have to write the physics system. If you want someone to be able to chuck a cube or hold a weapon or push a button, you don’t want to have to program how something is going to arc and fall, how things are going to ricochet, how light works. So you can use one of these-

Jon: 00:09:44

You want the wiggles to wiggle properly.

Mars: 00:09:46

Yeah. You want to skip all that and you want it to be consistent across different platforms. So we get these things called game engines. There are two really big popular ones, but there are a bunch of open-source ones. So the two popular ones are Unreal Engine, which you might know is the thing that everyone used to make Fortnite, and most recently has been used for films, feature films as they’ve got that volume technology that they’re getting on, that they’re making all these digital sets, virtual sets for films.

Jon: 00:10:16

Oh, cool.

Mars: 00:10:17

And their competitor is Unity, which is a game development engine that mostly is used for mobile games and is the one that Pokemon GO is built in. And I guess because their competitor had gotten into films, which is a bigger market than games, Unity was like, “Crap. Well, we better find our second market.” And they found out that people were using Unity to simulate their robots, that they could then make work in a lab. So obviously, if you can simulate something in a computer, it can try things out and fail and try again and learn much, much faster than it would be able to in the real world, but yet you can paralyze it. So if you have a robot arm who’s trying to learn how to pick up a cube and you’re trying to do that with a real robot, that’s quite difficult to do. You have to wait for that thing to try and fail and try and fail.

Jon: 00:11:04

Right.

Mars: 00:11:04

But in a computer, you can make a million of them that can try a billion times a minute. And people had managed to hack this together themselves, and now Unity decide to go whole hog and support it officially.

Jon: 00:11:17

Yeah, that is one, when I wrote my book Deep Learning Illustrated, of the three main reinforcement learning engines that you could use. One was provided by OpenAI. Another one was provided by Google DeepMind. And then the third was Unity.

Mars: 00:11:35

Oh.

Jon: 00:11:35

So even years ago they were big in that space. So what’s the relationship between Unity and learning algorithms? Why is it that something for simulating 3D environments is so useful for training machine learning algorithms, particularly reinforcement learning algorithms, like these robot arms that you’re describing? So is this relationship between Unity and these reinforcement learning algorithms, these learning algorithms, related to the way that Unity can represent things visually and spatially?

Mars: 00:12:20

Yes. So obviously, it makes the most sense if you’re trying to simulate something that’s trying to solve a visual, spatial problems, specifically if you’re trying to make something that will be translated into the real world for a physical thing. So you want to make a robot that assembles cars, you would have to get the physicality of that represented in the environment that it’s learning in. So you want the thing it’s picking up to have weight, to be slippery. And all of those physical aspects are really hard if you are trying to make an abstract representation of that without a full physics simulation, which Unity has.
But also, you can abstract other kinds of problems into something visual. So the really beneficial thing about Unity isn’t just that it could provide a lower level of abstraction in the good way and that it’s more applicable to the thing it’s eventually going to do, but also that you can watch it. It’s very easy to see when your model has gone awry or is doing something weird when you’ve got a physical environment that has cameras in it that shows you snapshots of what it’s doing. When you’re watching something learn, you can see when it’s gone weird, which is often the hard part about machine learning, is quite something started learning the wrong things, it’s gone off the rails, but you are just trying to look at a tensor graph and you have no idea.

Jon: 00:13:36

This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience. It’s the namesake of this very podcast. In a platform, you’ll discover all of our 50-plus courses, which together provide over 300 hours of content, with new courses being added on average once per month. All of that and more you get as part of your membership at SuperDataScience. So don’t hold off, sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.

Yeah. I don’t have very much experience with reinforcement learning and watching algorithms learn. Most of my experience is through a simple 2D game called the CartPole game. So with the CartPole game, you control a cart that can move left or right, that’s it. And there’s a vertical pole that you’re trying to balance on top of the cart. The game starts where the pole is slightly off balance to the left or to the right, but if you can quickly move your cart, you can get the pole to balance above, but the pole is always kind of wobbling over. It’s very easy for it to fall over, so you’re constantly shifting the cart left and right to make the pole stand up. It’s a really simple 2D game to test how well reinforcement learning algorithms are learning. And it is really cool watching that algorithm learn. So I imagine that you have much more depth and breadth of experience with these kinds of visualizations. Does some particularly cool 3D visualizations bring to mind for you of some algorithm that you have seen learn or not learn effectively?

Mars: 00:15:28

I don’t know. I’m always the most satisfied with the things where we’re teaching them to do procedural animation. So you have some sort of little figure and you want it to learn something, usually to observe how it learns. And so there’s this quite interesting field of computational biology or… I can’t think what the alternative word is. There’s another term for that field, but basically they want to examine how we have evolved and how different animals have evolved. And they thought that they could simulate it and we could learn something about it. So you end up with these simulations. You give it some constraints like, “Here is a entity, and it has the ability to consume things for food and reproduce and change what it physically looks like.

So let’s give it something to optimize for, say, survival and just let it go.” We can see something interesting in that. Whatever final form it establishes, we can say that has some utility that is in some ways ideal for the objective that it has.
We also ended up with, if we make a bipedal form that’s similar to a human form and we give it the concept of energy and the concept of locomotion and we tell it to propel itself forward but we don’t tell it how a human walks, how will it learn to walk? Is that more efficient than the way we walk? You get some really interesting things where, if you don’t give it a cost for energy so it’s not trying to conserve energy, then what they start doing is they start punching forward. They do this little run where they punch their fists forward, like a little windmill. It turns out that’s really efficient for forward momentum. If you’re not trying to conserve energy, that’s maybe how we would all be walking, running, and fist-pumping the whole way.

Jon: 00:17:06

That’s cool. That was a really wonderful example. A great visual example. So when you’re working with Unity and you’re making simulations, I guess, in your book and probably in a lot of cases in the real world, the kind of glue that we would be using to get going in Unity would be Python, maybe PyTorch.

Mars: 00:17:24

Yeah.

Jon: 00:17:24

Nice.

Mars: 00:17:25

Well, Unity has this ML-Agent’s framework. Basically all it does is, it opens it into process bridge that can be hooked into with Python. So you can use it with any Python framework. You can imagine, you can write your own, you can do your own hand-coded perceptrons if you want to. It’s just hooking into that engine, and it’s giving you the ability to feed it inputs and receive outputs. So you can make it talk to whatever you like, whatever you can conceive of in Python.

Jon: 00:17:50

Speaking of hand-coding, before we started filming, you described to me a really fun idea, which is that you could use a Unity video game simulation to have a completely visual video game-like, I’m going to use the word visual again, visualization of a neural network learning.

Mars: 00:18:13

Yes.

Jon: 00:18:14

You were describing you could have barrels representing weights increasing or biases increasing, and you can watch them-

Mars: 00:18:24

Like some big Rube Goldberg machine of little balls that roll down ways and then pulled out in weights or something. Yeah. It would be horribly inefficient, but it would be interesting to watch, because I keep saying you can abstract anything into a visual spacial problem, and you can. It’s not a good idea sometimes, but you can.

Jon: 00:18:41

Yeah. So that’s actually one of the main points of your book, is that not only can we be using these simulated bots to be doing things in a visual or spatial environment, say a 3D environment, to do work for us, but a central tenant of your book is that you could also convert other kinds of problems into a visual realm. Just like we described this neural network, you could convert it into barrels rolling and moving ropes, and those are controlling weights and you could watch a neural network learn in this visual-spatial way. And then you could have these bots do anything to solve any kind of problem theoretically, even though it is potentially less efficient in some cases, but maybe there’s the upside of visualization. Like you said earlier, we often don’t know what’s going wrong, so if we’re just watching our cost functions go down in some kind of tensor graph-

Mars: 00:19:51

Intensive board.

Jon: 00:19:51

Intensive board would be the most common choice. We don’t really have a sense of what’s going on. We just know that it didn’t work. So we hope that by reinitializing with random weights again and sampling in a different order-

Mars: 00:20:05

We’re going to stop her.

Jon: 00:20:06

Yeah. Just don’t do that again. Or maybe we decreased our learning rate or something like that. But if we could see what’s going on, if we could say, “Oh no, the barrels over there in the Tasmania region of my visualization, they’re all breaking and shattering upside down, crosses all over the place,” that’s not what we want.

Mars: 00:20:28

We really notice this even with the most simple simulations. Like you’re saying CartPole, there’s the 3D equivalent, which is, it controls a plate and there’s a ball that’s on it and has to keep the ball on top. So it’s same way, it has to jiggle and jitter and balance.

Jon: 00:20:42

What do they call that? The ball plate game?

Mars: 00:20:46

I don’t even know. I think it’s just Balance?

Jon: 00:20:47

I don’t know.

Mars: 00:20:50

But occasionally, I’ve done something where even I made the most minimal simulation, I’ve hooked it up wrong, the actions are in backwards or something. And the 10th graph looks like it’s kind of getting there for the first 10 minutes or something, but you’ll watch it and it’s acting weird. And so you go, “No, no, no, I’ve done something wrong.” We know it’s useful. Because we use it already, we see things go weird.

Jon: 00:21:14

The bipeds are all running around, punching the air.

Mars: 00:21:16

Yes.

Jon: 00:21:19

Anyway, so silly example. All right, so we’ve learned a lot about your book and about simulating data and then using those synthesized data and training machine learning models. We’ve learned about how we can be either using the simulated data for solving visual-spatial problems or contriving visual-spatial problems to take advantage of these kinds of simulation techniques. Do you have any particular examples from the book or from your life experience as to how we can use these simulated data practically?

Mars: 00:22:01

Yeah. Well, we still have those two halves where if you’re looking for purely simulation base, it’s usually things like self-driving cars. It means you want to be able to have something drive around in a place where if it stuffs up, it’s not going to hurt someone.

Jon: 00:22:15

Right.

Mars: 00:22:16

So typically, you would make a street environment and you would have simulated pedestrians, and they’re crazy and they walk all over the streets and there are things down in their way. You can also use this to make input so you wouldn’t be able to in real data. One of the examples we show is, you’ve made a city that this car can drive around in and it’s got cameras on its front and then it has to conduct itself appropriately. But then one of the other inputs that it can receive as part of its feedback is a map where its inputs have been keyed by what layer that thing is. So it’s a version of its camera input, but it’s like, elder trees are purple and all the pedestrians are green.
And this is a layer key so it starts to understand what it’s looking at.

So it knows, as it’s driving around, it’s kind of learning those road rules and proper conduct and what it’s looking at in parallel. You don’t have to go around tagging things yourself. Because it’s in a game engine, it can just query the object that’s there and go, “Oh, what are you? Oh, you’re a pedestrian. Cool.” And you can get things like, it’s teaching itself depth because it can guess how far away something is and you don’t have to go and measure it. It gets the reinforcement of, the engine just goes, “It’s exactly this far away,” because it knows. It has entire control over the environment and everything in it. And it knows everything in it and where it is and what its properties are. So complete knowledge of the environment that you don’t have to manufacture or measure yourself.

Jon: 00:23:46

Cool.

Mars: 00:23:47

But you also get, with the synthesis side, it’s more for if you have something that’s tedious or totally infeasible to do. So one of the examples we have are like, you’re trying to teach something to identify a dice roll. So you roll a dice and it tells you what side is up. And so yes, you could go around taking photos, 10 billion photos of dice in all different lighting environments with all different colored dice, all different sizes of dice, et cetera, et cetera. Or you can put them in a 3D engine and then give it the axis to randomize where the axis randomize other materials, the lighting, where they are relation to the camera, et cetera.
And it can make you all of a sudden an infinite number of those. Maybe more varied than you would’ve been able to produce in real life, which means it’s going to be quite robust and stability to detect different things. And that’s also good for like, you want it to identify trees, you can randomly generate trees. That’s really easy. You can make an infinite number of trees and you don’t have to go around taking pictures of trees. And yes, it will only work to a point because they’re simulated, but just adding in a little bit of real-world data actually makes it surprisingly effective.

Jon: 00:24:57

Very cool. I love those examples. Yeah. So it sounds like there’s a huge amount of value for listeners, for data scientists, to be learning about these kinds of Unity game engine-driven simulations and synthetic data. Very cool. And it seems like a super fun book with lots of hands-on practical examples. But, this is not your only book. So you also previously wrote a book called Practical AI with Swift. And that was also with the same co-author team. Is that right?

Mars: 00:25:34

Yes.

Jon: 00:25:34

It was Paris, Tim Nugent, and Jon Manning as well for that one. So what’s that about? Most listeners have probably heard of the Swift programming language, but maybe you should give us a couple sentences on Swift first and then why would we want to be doing practical artificial intelligence with Swift.

Mars: 00:25:56

Yeah. So Swift is a programming language that came out of Apple. It’s what you make iOS and macOS apps in. It’s a successor to their Objective-C language. And it’s still something that can compile down to C. It’s very portable, very powerful. It’s often compared to Rust as a really good language for systems programming. It’s really elegant. I really like it. It’s liked by a lot of the same people that like Go, which is the similar thing that Google has produced. Because a lot of big companies reach that point at once where they’re like, “We want our own modern programming language,” and so we got a bunch of good ones at once.

Swift is awesome. The thing about Swift is, it’s what people use to write iOS apps. And iOS apps were kind of getting this Renaissance where it really got to a point where basically half the people on the planet had an iPhone in their pocket that was capable of doing advanced augmented reality, of doing photogrammetry, of doing on-device really intense learning, like you can do on-device neural style transfer in an app you can whip up in five minutes. Yeah, it was just like all of a sudden, everyone has these really powerful, capable devices in their pockets. And yet even then, when we think about machine learning, we think about big complex academic things written in Python and not fun little apps on an iPhone that you can try yourself. And especially that iPhone development was a big area that had a lot of attention, particularly from younger programmers. And we just wanted to show them that they could start putting smarts. They could start putting machine learning into their apps really easily.

So yeah, we wrote another book. This was two years ago. I guess you can tell by now that my interest is taking frameworks that people don’t usually use for machine learning and showing how you can use it for machine learning. And I would write 10 million books on that topic with all different weird and wonderful things if I could, but it was just about showing the fun things. So again, it was more like a cookbook and structure where it was 20 different apps that it did all different things and looked at all the different things you could do on device with an iPhone. You can recognize people and you can put Memoji on their faces and you can detect drawings and stuff.

Jon: 00:28:00

Oh, yeah.

Mars: 00:28:00

And so it’s very similar in structure to now, the Practical Simulations with Machine Learning, which is, you can make little agents do this and that and this, and they’re both very visual, very creative. So yeah, I think they’re very similar even though the technologies that they’re about are completely dispersive.

Jon: 00:28:16

Right. Yeah. That makes sense. So with your current book that’s coming out now, you’re using Unity game engine to simulate data, but previously you were using Swift to do machine learning on devices, on things like iPhones and iPads. That’s super cool. And I like that thread that you drew between them, which I hadn’t thought of myself. That’s wonderful. It’s also interesting that you mentioned that you could do a million books because it seems like you might be on track to, in your lifetime, do about a million books. While you only have two so far, the gentleman that you have co-authored it with, Paris, Tim and Jon, they have authored dozens of books already. Tell us a bit about some of the books that they’ve written.

Mars: 00:29:12

Yeah. So they also kind of stumbled into it. They were at a party sometime with someone that was writing this book series about Objective-C, I think, at the time, which was then the processor language that people are using to write iPhone apps when they first came out, and this was somebody who had written the first two or three editions of a book. He is like, “Oh, I don’t really want to write the next one.” They’re like, “We’ll do it.” And so they got so into writing with writing… Well, they started writing with O’Reilly Media by writing a fourth edition of a book or something. And it turns out they were fantastic, someone who can do iPhones because historically, even though iPhones and Swift and app-related technologies are quite big in the real world, they actually haven’t had much of O’Reilly’s audience.

Jon: 00:30:01

Ah.

Mars: 00:30:01

It’s weird that when we look at the analytics of the platform, it’s not reflected in O’Reilly’s audience that Swift has been a weird side thing and their demographic of iOS programmers particularly is much, much lower than we observe in the real world.

Jon: 00:30:12

Huh.

Mars: 00:30:14

So they’re in this little niche where they’re among, I think, only three sets of authors who write about Swift and Apple-y stuff under the O’Reilly banner, which is both a blessing and a curse. Yeah, it’s quite niche.

Jon: 00:30:28

That’s right.

Mars: 00:30:28

The couple of people who are into Swift and into the O’Reilly learning platform, they don’t have many options. They’re very good to us. But yeah, there are not as many as you would think, but it’s really fun. They’ve also written most of the books that O’Reilly has released about the uni game engine just from a game development perspective. And they’ve also done some fun little side things about the Swift Pocket Reference, which is this cute tiny little book, and how to write apps for the Apple Watch and how to make games with GameKit on the iPhones and stuff. So it’s also like the crossover between the two.

Jon: 00:31:08

Super cool.

Mars: 00:31:09

Yeah. So between them coming up from game dev and me coming up through machine learning, we just kind of met in the middle and started writing about how you can use game technologies for machine learning.

Jon: 00:31:20

Do you happen to have another book that you’re thinking about for the future that you can share with us on air?

Mars: 00:31:29

Oh, we actually had a book canceled recently. We wanted to write about the Swift for TensorFlow project, which was… So the person who made Swift, he wrote LLVM as his thesis and he started working for Apple making Swift, and then he went to work for Google, and Google went fantastic. So kind of outgrowing Python, there are some limitations to what you can do with Python for machine learning. We really want to have a strongly-typed, really modern, fast language, particularly compiled language. We’re going to start porting TensorFlow to something like that. What are our options? Oh, we’ve got Chris Latin, let’s do Swift. And they started the Swift for TensorFlow project and they had started porting all of TensorFlow to Swift.
But more importantly, they also made a bridge where you could just call into arbitrary Python code from Swift. So you could import Python libraries and you could use Python types. So you could kind of wrap your Python-based libraries that were already there with some extra strength and robustness from Swift and particularly error handling from Swift, which was really good. And so we decided to write a book on this, and we started writing a book on this, and then Google killed the project, as Google tends to do.

Jon: 00:32:45

Oh, no.

Mars: 00:32:47

I mean, we just started. Obviously, the project still exists, there’s nothing stopping you from using it, but it’s not going to get ongoing development. So it wasn’t worth writing a whole thing to tell people that they should switch all that TensorFlow stuff to the Swift version if it doesn’t really have a future as a project, unfortunately.

Jon: 00:33:03

It went the way of Google Wave and my beloved Google Inbox.

Mars: 00:33:07

Yes.

Jon: 00:33:09

I’ve griped for so long on-air about the loss of Google Inbox before and the hours of productivity I lose every week because of that. But I won’t do it again now. I’ve done it enough.

Mars: 00:33:18

Now we can commiserate together.

Jon: 00:33:20

Oh, you’ve loved Inbox too?

Mars: 00:33:22

Anti-Google fan club. No, but they do have a tendency to kill products that we rely on.

Jon: 00:33:30

Yeah, I’m going to have my local Congress person draft a bill that will really make the economy more efficient by not allowing widely used free products from ever being canceled. You must maintain them forever, and we’ll have this perfect economy where we’ll never have any inconsistencies again.

Mars: 00:33:53

Probably decide to kill things. That’s the weird thing. It’s not that they kill things. It’s like every tech company kills things, but they’re just out of left fields. You don’t see it coming. You can’t look at the usage patterns of a particular product and be like, “Yeah, they’re going to exit.” They exit randomly. It’s so strange.

Jon: 00:34:08

I have a hypothesis, which is that it’s monetization.

Mars: 00:34:13

Ah. Yeah, that makes sense.

Jon: 00:34:14

One of the really wonderful things about Google Inbox was, it was so minimalist, including it didn’t have any ads. There was no way for ads to kind of get into your inbox. So that’s my guess for that one, but anyway.

Mars: 00:34:30

Don’t even get started. I have my old Linux bid tirade that I can go on about how they’ve ruined email, the spec, by adding all of these UI layers that people don’t understand, like read receipts. That doesn’t exist in the email spec. You’re faking it. You’re making people think that it is part of email. Do you know how they give you read receipts in emails? They put a little pixel at the end that is actually an image that you have to hit a web server to load. So when you open the email, it hits the web server and goes, “Ah, you’ve read that,” and then sends the read receipt to the other end.

Jon: 00:35:04

Right.

Mars: 00:35:06

It’s not part of an email. They’re just stalking you weirdly.

Jon: 00:35:11

Well, that’s probably the only way that Google is stalking us weirdly. Yeah, so that book was unfortunately canceled, TensorFlow for Swift.

Mars: 00:35:29

Yeah.

Jon: 00:35:29

Anything else that you-

Mars: 00:35:30

If I could write a book about anything, I would, I guess, try to contrive something more about the space work that I’m doing just because space is cool. Everyone thinks space is cool. That’s universally known.

Jon: 00:35:38

Okay, yeah. Perfect. So you’ve segued right into my next question. So you’re doing a PhD despite all these other things that we’ve already talked about you doing, writing lots of books. And we’ve got a lot of more things still to cover on the show, things that you do all at the same time. But one of those things is a PhD, and that PhD is about space. So you are making tools for the detection tracking and classification of space junk.

Mars: 00:36:02

Yes.

Jon: 00:36:03

There’s a lot of junk hanging around in space in low earth orbit. And so you’re doing a PhD with CSIRO, C-S-I-R-O, which is Australia’s premier scientific research organization. I know that there are lots of things that you can’t tell us about what you do in your secret space junk projects, but maybe there are some things you can tell us about this PhD work that you’re doing.

Mars: 00:36:30

Yeah. Well, CSIRO is like the national laboratories kind of mixed with the NRAO kind of a coolant in Australia. So they own most of the radio telescopes, and they’ve let me use some of those for my PhD. So the topic is kind of like most people nowadays have heard of the space debris crisis. They’ve seen the movie Gravity and they’ve heard of Kessler syndrome, so they know that there’s junk in space. We’ve been putting junk up there since the fifties, and we don’t-

Jon: 00:36:58

The bit in Gravity where space junk hits whatever the space station that we’re in or whatever, that is one of the most frightening parts in the movie.

Mars: 00:37:06

Absolutely. It brought a lot of attention to it actually. It’s been really quite effective as outreach because everyone’s like, “Oh yes, that Sandra Bullock movie. Now I know.” So it’s actually been great.

Jon: 00:37:17

Yeah. It was very easy for me to remember about space junk because I always think about that time that I pooped my pants in theater. I’m like, “Oh yeah, that time that I pooped in my pants. Space junk.”

Mars: 00:37:34

Oh no.

Jon: 00:37:34

I didn’t actually poop my pants. I promise. But I mean, Gravity was a stunningly… I bet it happened to someone out there. It was stunningly intense film.

Mars: 00:37:47

Visceral. Yes.

Jon: 00:37:48

Indeed.

Mars: 00:37:49

Oh yeah. So we know that we’ve put stuff up there in orbit. And for a long time we just went, “Oh, well, space is big. They won’t go near each other. It won’t matter.” And we ended up having this Don Kessler who was doing simulations about how asteroid fields form based on these cascade collisions. So one asteroid will hit another asteroid. They break parts off, and that’ll hit other asteroids. And that’s how you end up with, given a certain amount of time eventually, the size of asteroids in an asteroid belt with largely homogenized because they will have all ground each other down. And anything that’s much larger than something else would end up being broken apart because the speeds evolve, et cetera, et cetera.

He started going, “Hey, what about these things that we keep putting in space that’s nearer to us? Satellites, what’s up with those?” And he figured out that if you look at it in the same way, that because satellites, yes, there are relatively few of them in a very big space, but they end up in similar kinds of orbits. And particularly now, when we look at what’s fit for purpose, we want things to be in particular orbits. Then they actually do get close enough that they can hit each other. And there are cases where they have spectacularly hit each other, because you got to think about how something in low Earth orbit is moving like 10 times the speed of a bullet.

Jon: 00:39:01

Wow.

Mars: 00:39:02

So something that’s this millimeters big can take out a much, much larger object. So on average, if one piece of junk hits something else and it breaks apart, it will create about 10 other pieces that are capable of doing the same mass destruction as the original, which means you do get this cascade effect.

Jon: 00:39:21

Every once in a while.

Mars: 00:39:22

That if something hits something the wrong way, it could flow on and that could hit something and that could hit something, and we could have ablation cascade in orbit, which will make orbit unusable. We won’t be able to have satellites anymore. We won’t get space exploration. We’ll just be surrounded in a fast-moving field of shards. It’s rather important to keep an eye on.

Jon: 00:39:43

That kind of seems like what we deserve at this point. I mean, we don’t worry about enough things. We just keep creating carbon dioxide in the atmosphere and producing more styrofoam. It just seems like that’s the future that we’ve earned.

Mars: 00:40:03

Very human thing.

Jon: 00:40:04

Yeah. We’re just going to fill space with junk, make it completely unusable. And some people, some states are helping us along by, every once in a while, firing a rocket at a satellite.

Mars: 00:40:15

Oh, yeah.

Jon: 00:40:17

And just making sure that they can do that.

Mars: 00:40:19

Russia did that in November.

Jon: 00:40:21

Yeah.

Mars: 00:40:21

And we’ve had a few other times where I think China and India have also done anti-satellite weapons tests.

Jon: 00:40:27

Those are the countries I was thinking of. Yeah.

Mars: 00:40:29

Yeah. And every time, the US goes, “Ah, my goodness, cannot believe they have done this. There should be laws about this,” when the US has done more of this than anyone else, like call up Project Starfish where they were nuking the atmosphere just to see if they could.

Jon: 00:40:43

Right.

Mars: 00:40:43

They’ve been doing it for decades. There’s this really awesome photograph of one of the first anti-satellite missile tests that was done by the US, which wasn’t a land-based missile like a little rocket you would think of. The Chinese ASAT test was a truck-mounted rocket. The first one that the US did was actually on a fighter plane. They got someone to fly Mark I straight up and fire at a satellite in an orbit and he hit it.

Jon: 00:41:18

Wow.

Mars: 00:41:18

Crazy. Someone got a photo of it. It’s awesome. But yeah, no one’s ever attacked another country’s satellites, but they attack their own just to show they can, just so no one messes with them.

Jon: 00:41:30

Right. Back off. Okay, so we kind of understand the space junk problem really well now.

Mars: 00:41:36

Yeah.

Jon: 00:41:37

And so does what you do in your PhD have anything to do with machine learning? Or what are you up to?

Mars: 00:41:44

Well, it was supposed to. But actually, it’s in this domain called space domain awareness or space situational awareness, or sometimes called space traffic management or space surveillance. There’s lots of different terms because it’s rather new and that’s how fields work. When they’re brand new, everyone makes their own names. We have air traffic control for planes, which has to keep track of where all the planes are, where they’re going, what their objectives are and then try to route them so that they all get to meet their objectives without running into each other. We have to do a similar thing for satellites.

So because we don’t have any robust, scalable, really well developed to maturity solutions for going up there and grabbing debris and bringing it back down, which even if we could, we could only do that for lower earth orbit. We’ve got things out at Lagrange points that we’re just never ever going to be able to get them. So if we can’t remove debris, the best thing we can do is know where it is and try to move things out of the way if they’re about to hit, which means that all over the world, we have sensors that look at the sky and try to keep track of the 5,000 active satellites, the 35,000 large debris objects, down to the 130 million we think maybe of tiny fragments of debris that are out there.

Jon: 00:42:55

Wow.

Mars: 00:42:56

But the worst thing is that orbits aren’t stable. You think you would put something in an orbit and it will stay there, but it turns out that’s not the case because the sun does stuff which creates pressure around Earth. And we’ve got these charged particle belts that push things outwards. And also, when something is in the low Earth orbit, they drag along the top of the atmosphere, which means they slowly skew. And it means that actually we need to reidentify and replot the trajectory of every object basically every 24 hours, or it will be completely wrong.

Jon: 00:43:29

Wow.

Mars: 00:43:29

And also, at this distance, if we can’t figure out where it’s going to be, we probably won’t even be able to find it again.

Jon: 00:43:35

Oh, my God.

Mars: 00:43:36

It’s a constant, constant chase. And we’re trying to get better and better at predicting their trajectories, given what we know about the physical forces at work, but they’re so complicated and we can’t do it fast enough. We don’t have enough radios-

Jon: 00:43:49

I had no idea

Mars: 00:43:50

… to pick everything. So actually, my work slots into the more instrumental side of things, where in astronomy, you have the people who are instrumentalists, who are the ones who work with the sensors, the instruments. And so assuming that one radar to look at space can cost a billion dollars and take five years to build and we need them all over the world and enough of them to keep up with the growing radar satellites in space, we went, “Oh crap. We obviously can’t build them fast enough. What can we use? Well, we have radio telescopes.” But astronomical radio telescopes are made to look at deep, deep space. So if you get an array of them and all of the algorithms they use assume that there is basically an angular difference of zero, because something is so far away. It’s like, if you look with your two eyes, it’s something really, really far away, your two eyes are basically seeing the same thing. But if you look at something that’s here, all of a sudden what your different eyes are seeing looks quite different.

Jon: 00:44:44

Right.

Mars: 00:44:45

I started out thinking that I would be working in machine learning to try and get better at predicting those orbits and also identifying what object we are seeing.

Jon: 00:44:54

Yeah.

Mars: 00:44:54

Because if we’re not using optical, we’re using radios. We have to identify it based on transmissions alone. But now I’ve ended up looking at how we can adapt radio telescopes that were designed for astronomy, be able to see things right at the end of their noses, which means reframing the algorithms they use to account for Earth’s rotation while they’re observing and account for Doppler shift and all these complicated electromagnetic things, some of which would be on my grasp. It’s quite hard.

Jon: 00:45:25

Well, it sounds fascinating. And so you’ve ended up not so much doing machine learning in your PhD like you anticipated, but instead high-performance computing. So taking advantage of your very strong programming background.

Mars: 00:45:38

I love GPU programming. It’s the fun little package. If you haven’t ever done GPU programming, the easiest way to explain it is that you have to pass everything you want the GPU to do. In one neat little package, you have to say, “Here is the million data points that we want you to operate on. And here is the operation that we want you to perform. And here is the hardware context that you’re going to operate in. And here is where you put your results.” And you have to pass these little chunks. I guess it’s slightly closer to the metal than typical CPU programming. And yeah, it’s really fun. And the kinds of speed up you can get is crazy.

Jon: 00:46:14

Is that CUDA that you end up using a lot of or something else?

Mars: 00:46:18

Yeah.

Jon: 00:46:18

Yeah, CUDA.

Mars: 00:46:19

I mean, you can do open CLC, but everyone has basically decided NVIDIA as the default, now that you’ve got the awesome 3090s.

Jon: 00:46:29

Nice. So NVIDIA GPUs with the CUDA software on them. Yeah. Doing high-performance computing to detect space junk. That’s super cool. So yeah, you are clearly a super-strong programmer. I mean, you already have degrees in it.

Mars: 00:46:45

Yes.

Jon: 00:46:45

And now you’re working on the highest degree, a PhD, in computer engineering. So a little bit before we started recording, you and I were talking about different roots into data science. You have gotten into data science from programming.

Mars: 00:47:03

Yes.

Jon: 00:47:03

And I have, many times on air, said how the most valuable skills for data scientists to have are programming skills. And part of what drives my strong conviction behind this is that when we have guests on air who are the CEOs or heads of data science of fast-growing startups or big companies like NVIDIA, they are sometimes hiring data scientists. They are always, 100% of the time, they’re hiring software engineers. So these could be tangentially related to data science roles. It could be machine learning engineers or data engineers, but those kinds of roles are more in demand than just being, I guess what you could call a pure data scientist that is just developing models based on data that have already been provided to them. And then they don’t have to worry about the production implementation downstream.
So yeah, I think people can come into data science from a lot of different ways. I think that coming from a programming background like yours can be the most valuable in today’s job climate. And it sounds you run into lots of scenarios as a PhD student where people value you hugely because of that programming background.

Mars: 00:48:34

In physics, definitely. There’s a lot of that, what we call science coding. Whether people have come up through the sciences and at a certain point in their degree or in their career or in academia, all of a sudden it’s like, “Now you need to write your own models,” and they’re like, “Well, crap. I’ve never written a program in my life.” And they quickly learn Python and they write these single scripts that have no documentation, no command line arguments. They comment out bits of code to get different command flow. And it’s really monstrous for the repeatability of science. But at the same time, I appreciate that. If that’s what gets it done, not everyone has time to be amazing open source maintainers who makes a whole project out of every little script they make. So at a certain point, you just need to get it done. I mean, if that’s what makes research happen, that’s fine.
And the same thing, I really appreciate the skills that learning computer science has given me. I love my programing background. I love it deeply. But at the same time, I also don’t necessarily agree that it’s objectively the best way. Data scientists of all kinds. I can tell you that the program, as I know, care not at all about statistics. And every time they have to do something data-y, they just pick the first approach they find in Google for whatever keywords they got.

Jon: 00:49:50

Right.

Mars: 00:49:50

And so in some ways, people who have come from a more statistics background, or even nowadays you can study specifically data science, they are going to have different skill sets, and both is good. You can always teach yourself the other thing. I think it’s slightly easier for you to teach yourself programming when you’re surrounded by programmers than it is for you to teach yourself a deep understanding of math. So if you have to study one at university, maybe pick the one that you’re going to learn more from other people than just by trying yourself.

Jon: 00:50:18

Cool. Well, that is a good argument coming from-

Mars: 00:50:19

Everyone of all kinds.

Jon: 00:50:21

Yeah. There’s lots of space for all different kinds of data scientists out there. Surely, however, we can agree that regardless, developing both of these things then would be wonderful.

Mars: 00:50:34

Yeah.

Jon: 00:50:34

So whatever background you came from, it could be completely different from programming or statistics or math. People come into data science from all kinds of places and you can become hugely valuable by learning software development skills, program skills, as well as these mathematical things, statistical things, linear algebra, calculus, probability theory, statistical inference, all useful. And in your decades-long data science career listener, might as well have some fun rounding them all out and becoming a full stack data scientist as I’ve called them before on air. Cool.

Mars: 00:51:18

And the thing that remains consistent is the curiosity. People get into data science because they want to answer questions. Sometimes I see data scientists that have come from policy backgrounds and stuff because they want to know how something works. You often get people who have come to data science from sociology because they want to observe these different phenomenon. And I think it’s fantastic. That’s one of the things I love about data science as a field. The only thing that unites us is like curiosity and Python.

Jon: 00:51:50

Ooh. Our listener, cover your ears. The last episode that we filmed was at the New York R Conference. Live on stage, Mars. It was pretty cool.

Mars: 00:52:03

I love R. I think that R is a nicer language, but Python is slightly easier to interface with other things. So if you’re just trying to do analysis, you can do it in R. If you need to talk to another system or to output in HGTP or talk to a server or something, then you’re just going to have a hard time doing that with R. Or MATLAB is the other thing that they’re getting people into in university. Learn this highly proprietary ecosystem. You will not be able to take these skills anywhere.

Jon: 00:52:31

Yeah. Yeah. Professionally, we don’t see that as much though. You probably still get exposed to that a lot because of your PhD, but yeah, I don’t know. We don’t see a lot of that.

Mars: 00:52:39

Then I teach it at uni. Maybe my favorite part of my job is, I teach the [inaudible 00:52:43], Artificial Intelligence 101 and also Introduction to Data Science, which is everyone’s first, “Oh my God. You can answer questions with an Excel spreadsheet?” It’s actually really fun, getting them to respect these tools. I don’t know, there’s a lot of ego in first-year computer science, that they’re like a bunch of 18-year-old boys who have taught themselves to program and making their own compilers and stuff. And then they come and learn and they think they’re amazing, and then there are certain things that they just don’t think about. Especially in human factors where it’s like, let’s think about not how we can make this amazing system that’s a technological marvel, but that it works, really fit for purpose. For the humans, they’re going to need to use the system. And same thing in data science, they come in with a lot of ego where they’re like, “What? We’re going to use Excel, like office ladies do. That’s a tool for basic, basic the concept, not basic the language.”

Jon: 00:53:39

To basic highly numerate people.

Mars: 00:53:43

Yes. But we break down those things and we talk about how you can use any tool to answer questions. And I think that’s really fundamental thing. It’s really fun. One of the classes I teach uses Prolog and everyone’s like, “No one uses Prolog nowadays.” I’m like, “Yeah, but it teaches really important things.” It’s amazing to do. It’s great fun.

Jon: 00:54:03

What do you use Prolog for? What does it do?

Mars: 00:54:08

It’s like a symbolic execution kind of language where you can give it a set of premises and a goal state, and it will work through all the different options. Have you seen Prolog the language?

Jon: 00:54:19

No.

Mars: 00:54:19

Weirdly, the thing that it ends up still being used for in production is airline systems, because you can give it a rule like, “Jane is Bob’s mom and Tom is Jane’s dad. And someone being the parent of someone else who is a parent of someone else makes them their grandparent. So what is Tom to Bob? A grandparent.”

Jon: 00:54:44

Right.

Mars: 00:54:44

So basically, they do all these iterative things. You can use it for procedural stories because you can say that there are these characters that exist and these actions that they have access to and these motivations. Given these actions, how can these characters achieve their goals? And it will execute all the different variants. It’s very powerful.

Jon: 00:55:06

That’s cool. I didn’t know that. Going back a little bit, one could argue that in addition to curiosity, the thing that unites all data scientists is SQL.

Mars: 00:55:18

Yeah, yeah.

Jon: 00:55:21

Which really, everyone uses SQL.

Mars: 00:55:27

Well, everyone’s getting to the NoSQL now. It’s very hard to have not SQL. I really like it. I have a deep passion for domain-specific programming languages and all programming languages. We talked earlier about how I love late tech, and a lot of telescope software is still written in like Fortran. We did make modern programming languages for a reason, they have their purpose, but also knowing why these older and especially really hyper domain-specific languages were made, it’s really interesting.
Recently I was writing a library that was about ingesting enormous amounts of telescope data. And we are looking at well, we need to be able to stream data in and it’s always going to be too big to live in memory. And all of a sudden, I’m back to reading about how database crew languages work because they have a real strength in getting a bunch of data, because what they do is they structure a query in advance. So they do a bunch of optimizations about what parts they’ll even need to load into memory before they even know what that is. And all of a sudden, we are using sequel approaches in telescope software.

Jon: 00:56:30

Cool.

Mars: 00:56:31

That’s the beauty of computer science, taking approaches from everywhere and putting them everywhere.

Jon: 00:56:37

I’m so glad that I brought it up. So when I was researching for this episode, I came across that on LinkedIn, you had recently posted yourself as well as, I believe, several of the co-authors of the books, Practical Simulations for Machine learning and Practical AI with Swift that we’ve already talked about, altogether in an office with an Australian member of parliament. And so that MP was visiting the Secret Lab.

Mars: 00:57:13

Yes, not the chair company, the other one.

Jon: 00:57:20

Not the what company?

Mars: 00:57:22

There’s a series of gaming chairs that Twitch streamers and stuff have, these really fancy expensive office chairs that are made by a company called Secretlab. That’s newer than our secret lab, but has definitely become the more famous secret lab. There’s also a third Secret Lab that is a special effects company that you sometimes see come up in the credits of sci-fi movies.

Jon: 00:57:43

And there’s countless secret secret labs out there that we don’t even know about yet.

Mars: 00:57:46

Yes.

Jon: 00:57:49

But so your well-known Secret Lab, that’s not quite as well known as the chair Secret lab, what secrets are you guys up to? What are you labing up in there? What are you brewing up?

Mars: 00:57:59

Well, so my husband started a video game development company with his high school best friend back when they both finished their computing degrees. So this is the thing they’re being very nice about, is all my co-authors already have PhDs and they don’t put doctor on the cover because otherwise I’d be the not-doctor Mars because I’ve not finished yet. But yeah, they all have PhDs. They must have did human-computer interaction and urban informatics, this stuff. And yeah, they made a video game company. It’s actually Tasmania’s oldest video game company. And video game companies, I’ve said that word too many times in this sentence, but typically they don’t live very long. It’s a very gig economy. It sometimes gets like Hollywood where a company will form to do a certain production, then they’ll break off and they’ll do something else.

Jon: 00:58:44

Right.

Mars: 00:58:46

If that one’s not a hit, maybe they won’t do anything else, whereas they’ve been really stable. They’ve been running this company for 14 years. They’ve made games and apps for Australian children’s television shows and for Qantas, the airline. They did a bunch of in-flight games and stuff. And so they really made a niche for themselves in that they do have computer science PhDs. They also did a lot of work in early education. And there was a niche when the iPad had just come out, everyone went, “Oh, my goodness, this is amazing for kids that don’t yet understand keyboards.” They might not be sort of pre-literate children to have interactions with this digital thing. How do we make games for people who can’t read yet that they recognize shapes and colors, but you can’t give them instructions? How can we give them a meaningful experience? And yeah, they actually made an early niche writing games for really, really little kids that then turn into a pretty great market of making children’s educational games for all ages. And now even, they get contracted by companies to make games that are educational about what they’re doing.
So recently, they wrote a game for the local power company where you play a quoll, which is a Tasmanian native animal, who is broken into someone’s house and is trying to make their power bill really high. And in doing so, you learn all about the different tariffs and what things consume the most power. So that kids know, if you leave that on, that’s going to cost your parents lots of money, way more than leaving that other thing on. So it’s quite fun. And yeah, on the side we write books. The other two of us, co-authors, are also generally contractors on basically all their projects, just because we all work in the same office. Tasmania is a small place. There’s only so many people in IT on our island.

Jon: 01:00:32

It’s the only office that we could get.

Mars: 01:00:36

Yeah, we do a bit of everything. Some of the contracts I’ve done with them have been getting a contract from a not for-profit to build a Minecraft world where kids can simulate natural disasters to learn about how you respond to a crisis and evacuate, or making a bunch of Swift playgrounds that show you how to collect space junk and also teach you how to code in Swift that then like goes to National Science Week and the data science center.

Jon: 01:01:05

You made a space junk game?

Mars: 01:01:08

Well, it’s like a code-long activity, but yeah.

Jon: 01:01:11

Yeah, that’s cool.

Mars: 01:01:12

It’s supposed to-

Jon: 01:01:12

Is that a coincidence?

Mars: 01:01:17

No, I think I just started my PhD, and it was still very exciting. It’s still exciting, but at that time I was definitely like, that was all I did. I was telling everyone, “Oh my God, I started space PhD,” because I guess I thought that going into computer science was an active choice to not go into space instead. And like I said, our island’s very small, our university doesn’t have many courses. And I think they had just axed the astrophysics major just before I moved here. I was very sad, but I probably wouldn’t have had the math jobs. Let’s be honest. I work with the astrophysicists now and they’re all very smart.

Jon: 01:01:54

So before jumping into that PhD, you did have a brief stint doing some kind of math stuff as a data science intern at the famous Australian graphic design platform, Canva. So you were working on design automation there.

Mars: 01:02:10

Yes.

Jon: 01:02:13

One of the other many things that you’re currently doing now is, you’re also a unit design advisor for a post-grad mobile development course. So is there a meaningful connection between artificial intelligence and these different kinds of user experience, these user experiences that you have? Or is that something that’s just kind of happened randomly? It’s just something that you’ve happened to specialize in?

Mars: 01:02:43

I feel like I’ve done this terrible thing to you, try to make you draw connections between my absolutely chaotic career, which is just I do whatever I like all the time. I went to Canva because I just finished my honors, which is in Australia, if you do an undergraduate degree, it’s only three years, not four years, but you can do an extra fourth year that’s like a mini master’s. And once you’ve done that, you can go straight to a PhD. And typically, people do those just because to move to the US, you have to have done a four-year degree. So I’d done that not because I was super passionate about research but because I guess I was still hiding in university and I wanted to have options. So I’d done that.
And then I was like, “Okay, well, it’s time to leave university and go out into the big brave world.” And I just didn’t want to. I wanted to do a PhD instead. Everyone I knew had done a PhD, and I didn’t know if I really wanted to go into academia, but I loved the freedom of being a freelancer and a student and having those contracts where I just kind of do whatever work comes my way that I like. I take contracts in all sorts of things, like anyone on the island comes and goes, “You know computers. Come help me do blah with computers.” And it’s always something totally chaotic and it’s really fun. And I love that.

But I went, “I better give getting a normal human job a go.”
So I went to the mainland, it’s what they call it, the big north island of Australia for a few months. And I went, “I’ll take an internship and I’ll see how I feel about it. And then if I like a job, then I can save myself studying four to seven more years and just get a job. That’ll be great.” But kind of fell to pieces. The Canva was quite cool. It’s a cool platform. They had some really cool problems. I was on the design automation team, which is like, when someone adds an element to their design, how can we have really smart defaults? When you add text, what color is it going to be? When you add a shape, where is it going to be, what size is it going to be? When you go and open the panel to search for something, what are the things that are going to be recommended? And all of those things are impacted by what you’ve done so far in your design. And even what you’ve done in previous designs.
We want to learn how you make these designs so we can help make it for you. Because a big appeal of Canva is like, it’s not the Photoshop interface where you’re just shown all of these things all the time. It’s supposed to be very transparent that it’s making what you would have but in fewer steps because it knows you. And that’s great. That was a really awesome experience.

But it was also during the Black Summer fires where you couldn’t go outside without a smoke mask on. My husband got a lung infection in the first week. And all these people who lived in Sydney, I guess, because we are from the beautiful island of Tasmania that’s all pine forest. We have some of the cleanest air in the world. Actually, I think the northwest of Tasmania does scientifically have the cleanest air in the world. And so we’re really spoiled.
So we went there and all the people who live in Sydney were still walking around, outside the apocalypse. We were like, “No, we’re not leaving the apartment.” And so it wasn’t as fun, I guess. I think I had fun, but I had no fun for not work reasons. And I came back to Tasmania. We don’t have as many big startups like that down here. I told you earlier that the iPad app Procreate is actually from Tasmania. It’s made by a company called Savage Interactive. We have a few really big web platforms down here. There’s this kind of Canva for video type company, Biteable, where you make motion graphics and stuff. It’s this online video editor. They’re down here as well. But yeah, they’re all like 10 people. So getting a big startup job in Tasmania is just not an option. So you end up having these chaotic freelancer risks.

Jon: 01:06:19

It sounds like you have managed to forge for yourself. In just your early career, you’re already doing so many different interesting projects that blend data and blend programming and writing and teaching. It sounds like you have a fabulous career in Tasmania while you can also avoid lung infections.

Mars: 01:06:42

Yes.

Jon: 01:06:43

So sounds pretty great.

Mars: 01:06:44

Maybe the best part.

Jon: 01:06:46

So as a regular listener of the show, which I was delighted to learn when we were talking before starting recording, you are already aware that I always ask for a book recommendation near the end of the program. You’d been thinking about this big stack of wonderful books that you’d put together, but you don’t have it with you right now.

Mars: 01:07:08

That was my light Christmas reading. I decided to go back and read all of the computer science greats. I was going to read the Dragon Book of Compilers, the MIT Structure and Interpretation of Computer Programs, and the Secrets and Lies about cryptography, which is really, really cool. Academia so suffers from people just trying to put their information on paper, and I’ll think about the narrative and the flow and the tempo. But Bruce Schneier is an absolutely amazing author. So even though I’m not into cryptography, he makes me be into cryptography because he’s so interesting.

Jon: 01:07:39

Cool.

Mars: 01:07:40

And none of those in my book recommendation.

Jon: 01:07:44

So what is your book recommendation, Mars?

Mars: 01:07:46

I decided that it would be one that I had physically in my office. So this is The Pattern on the Stone: The Simple Ideas that Make Computers Work by W. Daniel Hillis.

Jon: 01:07:57

Good old W. Daniel.

Mars: 01:07:58

Not very big, but is a little book that’s kind of about how anything can be a computer. So basically, they go through like if we have these logic gates and this is how they can make decisions and this is how we can build up more complex logic. If we can do that with circuits, we could do that with pipes, we could do that with toothpicks. And it goes through how these simple building blocks make something capable of complex logic and about how the fundamentals of computing is not specific to electronics at all. It’s just a method of decision making, which is really interesting. I read this book entirely while in the waiting room for surgery last year. It wasn’t an emergency, but it was a medical thing. And so I guess I was kind of a little bit medicated at the time, which made the book even better. I came home to my husband. I was like, “This is the best book I’ve ever read.”

Jon: 01:08:55

Nice.

Mars: 01:08:55

And I’ve read it again since. It holds up, though maybe not quite as well changing.

Jon: 01:09:01

Well, a great recommendation for all you stoner listeners out there. Super fun. I’ve loved having you on the show, Mars. You are a brilliant communicator-

Mars: 01:09:15

Thank you.

Jon: 01:09:15

… which shouldn’t comes too much of a surprise given all of your teaching. You’re also just a lot of fun to talk to on deeply technical topics. You really know your stuff. And so would love to have you on the show again someday, but in the meantime, how can listeners follow you online and keep up to date as you release the next in the series of your millions of books?

Mars: 01:09:39

Well, I did made the decision to leave Facebook recently, which was actually great. It was fantastic for my mental health, but I’m still on Twitter. I’m a terrible Twitter gremlin. My handle is @TheMartianLife because I’m Mars, the Martian. I’m also on Mastodon on aus.social, as in the Australian instance.

Jon: 01:09:59

What is Mastodon?

Mars: 01:10:01

So Mastodon is a open-source federated alternative to Twitter. So the idea is, if you have Twitter, you’re all on the single platform and all of your data lives on their servers. What if we went back to more of a IRC model where you can host an instance of Mastodon, which is just some open source software that you can download and you put on your servers and people who live near you or a part of your community, which might be the academic community, it might be the telescope community or whatever that is, can join your instance? Only you hold their data and they can search people from other instances. So it creates this big meta network.
But the whole point was that it was supposed to give people more control because if you run your own instance or server, you can make community-specific rules and you can have community-specific jokes and conventions. And we find that these smaller groups actually end up with more meaningful interactions. People post longer form and more personal things on there than they necessarily do Twitter. So yeah, they actually engage with it in quite a different way.

Jon: 01:11:01

That’s cool.

Mars: 01:11:02

I’m on Mastodon. It’s very good. It’s definitely less depressing than Twitter most days. Everyone’s very nice. I also have a website which you take one look at and you’re like, “She wrote this in handcrafted HTML.” And yes, I did because my web dev skills are from the ’90s. But I also believe that it’s a good thing in that it’s easy for a screen reader to read. It’s easy for you to scrape all of the styling out of, and it still holds up as pure text. You can access it in a terminal text-only browser on low data and everything. So I’m at themartianlife.com if you want to look at my incredibly retro website.

Jon: 01:11:40

Nice. And then you are on LinkedIn as well, if people want to communicate there.

Mars: 01:11:47

I am but I don’t use it. I’m terrible. Everyone’s like, “LinkedIn’s the most important.” I’m like, “Yeah.” All the people I follow there are researchers from uni, and it’s just depressing watching how hard they have to spruik themselves to get grants. When I know this, how overworked they are, that they go home and they queue up LinkedIn posts or all this stuff, I’m like, “Just take a day off, man.”

Jon: 01:12:08

Yeah. Well, you do you. You definitely shouldn’t feel like you should be on LinkedIn. And it sounds like you’re in enough places, especially Mastodon. That sounds super cool. All right, Mars, it’s been wonderful having you on the show. I look forward to reaching up again sometime in the future.

Mars: 01:12:23

I have to have another chaotic career change so I can be interesting to talk to you again. “Mars, why did you go into manufacturing obscure plastics?” I don’t know. I don’t know what my next career changed. I’m not moving from space. Who am I kidding? Space is the coolest.

Jon: 01:12:38

Nice. All right. We will look forward to that manufacturing update. Catch you then.

Mars: 01:12:46

Thanks for having me.

Jon: 01:12:53

What a brilliant person Mars is. It’s staggering how many different topics from software to data science to space she can dive deep on. And she communicates so effectively too. What a treat it was to have her on the show today. In this episode, Mars filled us in on how synthetic data derived from simulations can provide us with infinite quantities of potentially very high quality data for training machine learning models. She talked about how bots from simulation engines such as Unity can be used to solve any problem expressed spatially, which could be literally any computational problem. She talked about how GPU programming with CUDA is essential to the high performing computing needed for tracking space objects with radio telescopes, and how, if you’re considering a career in academia, doing an internship in industry could be just what you need to convince yourself academic life is the better fit for you.

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Mars’ social media profiles, as well as my own social media profiles at www.superdatascience.com/591. That’s www.superdatascience.com/591. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks, of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng, and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another fascinating episode for us today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 591: Simulations and Synthetic Data for Machine Learning

SDS 591: Simulations and Synthetic Data for Machine Learning

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

November 18, 2025

November 14, 2025

Podcasts SDS 591: Simulations and Synthetic Data for Machine Learning

Share

SDS 591: Simulations and Synthetic Data for Machine Learning

Podcast Transcript

Share on

Related Podcasts

November 21, 2025

SDS 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

November 18, 2025

SDS 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

November 14, 2025

SDS 940: In Case You Missed It in October 2025