Kirill Eremenko: This is episode number 263 with founder at Kyso.io, Eoin Murray.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur and each week we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today and now let’s make the complex simple.
Kirill Eremenko: This episode of the SuperDataScience podcast is brought to you by our very own Data Science Insider. The Data Science Insider is a weekly newsletter for data scientists, which is designed specifically to help you find out what have been the latest updates and what is the most important news in the space of data science, artificial intelligence and other technologies. It is completely free and you can sign up at www.superdatascience.com/dsi. And the way this works is that, every week there’s plenty of updates and seemingly important information coming out in the world of technology. But at the same time it is virtually impossible for a single person, on a weekly basis, to go through all this and find out what is actually really relevant to a career of a data scientist and what is actually very important. And that’s why our team curates the top five updates of the week, puts them into an email and sends it to you.
Kirill Eremenko: So once you sign up for The Data Science Insider, every single Friday you will receive this email in your inbox. It doesn’t spam your inbox it just arrives and has a top five updates with brief descriptions. And that’s what I liked the most about it, the descriptions. So you don’t actually even have to read every single article. So, our team has already read these articles for you and put the summaries into the email, so you can simply just read the updates in the email and be up to speed in a matter of seconds.
Kirill Eremenko: And if you like a certain article, you can click on it and read into it further. And so whether you want great ideas that can be used to boost your next project, or you’re just curious about the latest news in technology, The Data Science Insider is perfect for you. So once again, you can sign up at www.www.superdatascience.com/dsi. So make sure not to miss this opportunity and sign up for The Data Science Insider today. And that way you will join the rest of our community and start receiving the most important technology updates relevant to your career already this week.
Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies and gentlemen, super excited to have you back here on the show. And I literally just got off the phone with Eoin Murray, who is one of the founders at Kyso.io. Kyso.io is an amazing tool which you will love hearing about. It’s a platform where you can blog about your data science projects using tools such as Jupyter notebooks. So it really makes sharing of projects very easy and creates a fantastic user experience for the readers who are going to be reading your projects. And this all ties in very well with the whole notion of building your online presence and online portfolio in order to progress your career forward and to impact people, to help people and make a statement out there in the world.
Kirill Eremenko: So I’m very excited about this product, not just the podcast, but Kyso.io, I think it’s a really cool thing and in fact the base version is actually free, free forever as you’ll see on the website. So I’m sure you guys will love checking it out. And what are we talked about on this podcast is we started off with some very interesting conversations about startups and how you can jump into creating a startup, what accelerators are, what angel investors are, what venture capital funds are, what’s Eoin’s journey has been like in that process. So this is his second company that he’s found. He’s a serial entrepreneur. He’s been through the Techstars accelerator. He’ll tell you all about what it was like there. What mentor madness is, what you get out of these experiences in the startup world. So if you are interested in or even considering at some point, maybe down in the future, to get into a startup or create a startup, I think this will be very interesting to hear about.
Kirill Eremenko: Then we talked about Kyso.io, the actual websites and product that they’ve created and what it means for data scientists and how it is actually so important to communicate data science insights in a non complex way and how Kyso facilitates that journey. I recommend because I think Kyso has got a bright future. It’s like Github, but with a lot of additional layers that make the experience really cool. Plus it has integrations with Github anyway. So I think you’ll find it interesting. Kyso probably got a very bright future ahead and you’ll be one of the first people to hear about it on a podcast. And finally at the end we talked about Eoin’s other interests. So Eoin is a really interesting person. He used to do quantum computing, he’s worked on really cool projects. So we talked about his view of where data science is going, what the future’s like, whether or not data science should be a certified profession.
Kirill Eremenko: And he gave us an example of a project from his past life dealing with the E. coli bacteria using lasers and data science. So I think you’ll find that interesting. On that note can’t wait for you to check out this podcast. And without further ado, I bring to you the founder at Kyso.io, Eoin Murray.
Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies and gentlemen. Super excited to have you on this show today because I’ve got a very exciting and interesting guest calling in all the way from Valencia, Spain, Eoin Murray. Eion, how are you going today?
Eoin Murray: I’m brilliant Kirill. Thanks for having me on this show.
Kirill Eremenko: It’s my pleasure. I’ve heard a bit about your work and we were introduced by Raul Popa who’s been on the podcast before, so I’m very excited about the things we’re going to talk about. How did you end up in Valencia? I’ve never asked you this. Like you’re from Ireland, what are you doing in Valencia?
Eoin Murray: Oh, cool. So my co-founder, Elena is Spanish. And we actually founded Kyso in Andalusia in Spain. And then we moved to New York for a bit to do a Techstars New York City. That’s where I met Raul, who was on the show with typing DNA. So Techstars is a program where, if you’re starting a startup, they will give you some investment and tons of advice. And you go in and you grow really fast and then you maybe raise some more money from investors, and then you either stay in New York or you go back to wherever you are based previously. So we came to Valencia because it’s a great place to live. Its next to the beach and the Internet connection is outstanding. And yeah, it’s a really good place to start a company.
Kirill Eremenko: Got you. Did you, by the way, like I was learning Spanish a couple of weeks ago and I noticed that they don’t pronounce the letter V. So for them, Valencia is the same as Valencia. Do you hear that?
Eoin Murray: Yeah. It can be confusing sometimes. And then you have different regions of Spain have quite different Spanish. So in Barcelona, they’ll say Barcelona, but then Andalusia they’ll say, Barcelona.
Kirill Eremenko: Barcelona, yeah. Yeah. It’s a Catalan versus of, what’s the other one?
Eoin Murray: The Castilian.
Kirill Eremenko: Castilian.
Eoin Murray: Is the Spanish that you maybe call it Spanish.
Kirill Eremenko: Yeah. Got you. So Techstars, that’s… First of all, congratulations. That’s really cool. We’ll talk about a Kyso in a second, but Techstars, just so I understand that better. So there’s angel investors and there’s venture capital funds, like angel investors come earlier, venture capital funds come later. Where is Techstars or, you mentioned before the podcast, it’s similar to Y Combinator, where do those types of companies sit? Close to angel investors or venture capitalist?
Eoin Murray: So Techstars would typically be your first investment or very, very close to your first investment. So when we did Techstars, there was 12 companies in our batch. So they do the program twice a year in many different cities around the world. Manage programs, then they’ll do it twice and they’ll have, maybe 12, 11 to 13 companies in each batch. And when we went there, we were very early stage, so we didn’t have revenue. We don’t really start product. There was a few companies in the batch who actually hadn’t even started building their product by the time they got in. However, there was some companies who were doing, like half a million in revenue so far that year. So there’s a mix. But it’s like they’re typically very early. I think the traditional thing is you come in to Techstars when you’ve released your product maybe and you’re ready to grow it really fast and they’ll give you tons of support to grow it really fast.
Eoin Murray: And then at the end of the program, after the end of the three month program, there’s a demo day or an investor week where they’ll sit you down with 30 or 40 venture capitalists and angel investors and you try to raise more money.
Kirill Eremenko: So they come even before the angel investors.
Eoin Murray: Yeah. Well, yeah, roughly speaking. As always. I mean, every company is unique. Everybody has a unique story behind it.
Kirill Eremenko: So, and was it that hard to get into Techstars? It was like the [mental 00:10:06] prerequisites or screening difficult process?
Eoin Murray: It’s quite a selective program. I think for ours it’s maybe 12 companies out of 1500 applications or something, but there’s a lot of other accelerators on the world. So anybody who’s listening, who’s interested in startups, it’s a pretty good way to get your startup off the ground. Especially if you’re thinking of starting a startup. Maybe you have a job and you’re thinking this is something you might enjoy doing. Accelerators are a really good way to de risk the idea for yourself. Techstars is a really good one. It’s a very famous one. It was hard to get into. We were lucky because both myself and my co founder are technical, so we can code and we’ve had experience in data science where that’s Kyso’s area and we had also started a company before and raised money for a company before. So I think that gave us a bit of an edge up.
Kirill Eremenko: So like they could see, you know what you’re doing?
Eoin Murray: Yeah. Yeah. But I mean in general accelerators are a really good way for anybody to kind of, even the interview process helps you refine your idea and let you know if you actually want to pursue something. So if anyone’s like listening, I would definitely say like if you have an interest in a startup, even I’m from a small city in Ireland. Ireland has a population of 4 million people and I think there’s like 12 or 15 accelerators in the country now, that you can apply to. And then there might be a country nearby you so, if you’re in EU, there’s plenty of accelerators you can apply to. So you just chat to loads of people and see if you get into one.
Kirill Eremenko: How come you went to the one in New York City then?
Eoin Murray: We got into a carpool around Europe and even one in Hong Kong. Alex Iskold was the guy who ran that program, and he was just really, really helpful even in the interview process. And he adheres, he liked strong technical skills. So he knew what we were about. It depends on the person, each accelerator is very unique. So Techstars even runs money programs, but depending on who is the specific team in your program will completely change your experience that you have, so we were just like drawn to the program Alex had set up and that worked for us.
Kirill Eremenko: Interesting. And so once you get in and then you get there, is it like a several week process? What is the program? How has the program is structured?
Eoin Murray: Yeah, so I think, each person or each MD or managing director of each program will have a specific flavor. So I know for example, Techstars and Y Combinator, have a quite different philosophy. So Y Combinator takes in about a hundred companies into a batch and they basically say, “Come in and talk to us once a week. But other than that you should be living and working in your flat, coding and building product every day.” Techstars is a little different. So what they do is when you come in, they do like what they call, mentor madness. So it’s like two week process where, they will find many, many experienced venture capitalists, experienced founders, experienced product people or experts in various sectors and sit you down and you have like half an hour meeting with about five people a day.
Eoin Murray: And then you pitch your idea and then they all give you feedback. And they do that in the first two weeks and you definitely, after the first two weeks, we’ll have either refined or changed your idea a little bit and then you do maybe, and then that’s the first two weeks of three months. Then the rest of the program is basically you set a weekly target and you do whatever you need to hit that target. That can be building product, they can be doing sales calls, they can go meet the customers. And you do that to the end of the program. And then the last two weeks is trying to raise more money. And there’s a lot of workshops along the way. And then there’s like you have a meeting with mentors every week to kind of help you solve whatever specific problem you’re facing right now.
Kirill Eremenko: Gotcha. And all their requests in return is a share of your product.
Eoin Murray: Yeah. So a Techstars is like, they give you about a hundred thousand dollars of investment. And then for that investment plus the program, they take about 8% of your business.
Kirill Eremenko: Oh, okay. Well that’s not too bad at all. Good. I think that’s pretty fair.
Eoin Murray: I mean, if you think of your coming to the program with a certain valuation and you leave with a higher one, you’ve already personally made money by the end of the program. If you think of that valuation is.
Kirill Eremenko: Yeah. But like as you say, the connections you make and the learning you experience throughout the process is invaluable.
Eoin Murray: I mean, it’s ridiculous how much you gain. Even personally.
Kirill Eremenko: Yeah. No wonder there’s so many applications, 1200 and only 12 or something that get in. That’s crazy. Crazy one out of 100 makes it.
Eoin Murray: We were quite lucky.
Kirill Eremenko: What would you say contributed to this success of getting through? Was it like you knowing somebody or something about your idea or your application?
Eoin Murray: Oh yeah. This is actually a funny one because it was actually at my first startup, which I started in the UK and I was trying to scramble. So at that point I really didn’t know what it was doing but I would take a meeting with anybody and I think that that was the right approach. And I ended up basically like trying, when I was trying to raise money for my first company. It’s quite John Bradford in the UK who actually previously ran Techstars London. I got onto him and he was trying to give me advice, funny giving instruction in the UK, none of which panned out in [inaudible 00:16:11], a lot of money for that. But then later on he gave me another connection who then gave me another connection to then put me in touch with Alex Iskold, really kind of like, and when I first met the guys, I wasn’t thinking of how this will pan out almost two years later that I’d be able to follow a network root to Alex who then led us into Techstars.
Eoin Murray: And another point I think is important to make is we applied really, really early. Maybe Alex was running maybe a two month application process, I’d say we spoke to him about two weeks before he really started doing that. And that helped us get in because, him and the team were not talking to too many other companies at that point. There’s still the open spaces, maybe if we had applied in the last week of the application window, it would have been a lot harder.
Kirill Eremenko: Got you. What is very impressive to me is that you mentioned you not only got into that New York, NYC chapter, you got into a couple of other ones in Europe, are they all linked? Or [crosstalk 00:17:21]
Eoin Murray: No, no, it was just other different on connected accelerators. So basically myself and Elena we’re, my co-funder, we’re based in Spain running out of whatever little money we had, we were funding ourselves with, and we needed to raise money. So we applied to everything and got into some things and then chose Techstars.
Kirill Eremenko: Got you. Understood. Okay. Wow. Well thank you very much for the rundown. I’m sure if anybody’s looking to get into startup now, they’re very well equipped with the whole process that accelerators follow, how to get in on that.
Kirill Eremenko: And on that note, tell us about Kyso. Like I think there’s so much anticipation built up now. You got to tell us what this idea is and guys listen up this is pretty crazy. It’s really data science related, relevant. And I’m like really sure a lot of you are going to be using this after this podcast. So please Eoin take it away.
Eoin Murray: So very, very simply, Kyso is a place where you can blog your data science. So if you have a chart that you want to share or a dataset, or you want to write an article, a data journalism article, you can post all of this to Kyso. So it’s like Medium, but we want to focus on data science. And to make that even easier for data scientists, is we actually support a lot of the data science tools. So for example, Jupyter notebooks or Markdown notebooks. So what that means is that, so with Jupyter notebook is like a really, really common data science tool where it’s an interactive coding environment where you type code into a cell, you evaluate that code and the results appear to you live in the document. So this is super useful if you’re visualizing data.
Eoin Murray: So even if you’re making a line chart, you just type in the code, evaluate the cell, the chart appears inside the documents. I used to work with these so much in my past career. And there were a bit little difficult to share. So you can share them for example, on Github. But then they look like this kind of technical document where the code and like any terminal output is all visible. What we do in Kyso is we just hide the code by default. Now you can click a button to see it again, but you’d basically upload your hardcore data science document upload it to Kyso and it just looks like a blog post. So it means, why its so useful is because you can be writing a technical document and then you can trivially share it with a non technical audience without needing to do any extra work.
Kirill Eremenko: That is really cool. And for those listening who, if you’ve taken our Python A-Z Course, that whole course is done in Jupyter notebooks. And in fact, Jupyter notebooks is a very powerful tool. It’s like some of the big companies like Google, Facebook and so on, use Jupyter notebooks for some of their work. And you can do end to end even deep learning and AI in Jupyter notebooks. So if you haven’t heard of Jupyter notebooks then definitely check it out. It’s a really cool place where you can not only just code, what I like about it is that not only just code, but you along the way can write comments, can annotate things and what’s Eoin and the team at Kyso have created is that you just like upload your Jupyter notebook and it renders really beautifully into something that people can read and the user experience is really cool.
Eoin Murray: One of the one things I guess when I was learning python and data science in the beginning found super useful because at this point where you type into a cell and then like you type code into one box and evaluate that and it just really allows you to interactively play with your code. You know what I mean? And you learn a lot faster and a lot more because, and you can do super cool things. Like if you tab, is it command tab when you’re on, say if I’m using pandas and I go pandas.dataframe and I’m like, what are the docs for data frame? What’s the order of the arguments? I can either Google that or I can actually like do command tab and it just like, a little pop up appears with all the documentation for that specific function. It’s just really, really helpful way to get started in data science.
Eoin Murray: And then it’s cool because it’s actually still the tool you will use when you’re an expert in data science, when you’re doing it day to day.
Kirill Eremenko: How’d you come up with this idea?
Eoin Murray: So, in a past life I worked in science. So I used to work as a quantum computing researcher in Ireland and then in the UK. And basically the workflow that we had was, we would design the chip, then bring it to the lab. So these chips were interesting because a typical computer trip runs on electricity. This would run on light. So we would use optical fibers and plugged light into these chips. And then we would measure the spectrum or various pieces of data about these chips. And then maybe me and other people on the team would take the data and have to process the data, maybe make a track of the spectrum, track of the temperature, see how it’s working, and then share those tracks with the rest of the team, so that then we could like analyze yesterday’s experiment to design a new chip for next week. Does that make sense? And we played with a lot of tools, so I mean, you can always import your data into excel. But that quickly just wasn’t quite powerful enough for all of the customized analysis that we needed to do.
Eoin Murray: So we stumbled upon Jupyter notebooks. And it’s such an amazing tool for this where you can write your comments, you can format the document, you can have all of your plots and tracks in the document. But we just found them a little difficult to share and a little difficult to reuse. So if for example, if we’re collaborating on some projects and I’m doing a notebook today and then next week you want to use it, I mean you can use Github and it’s currently, that’s currently a good way to reuse them. But maybe if you want to take a snippet and you need to be able to discover and see and read my documents or my notebooks before you’ll know exactly what you want to reuse. So we found that a little difficult.
Eoin Murray: And then I went to the UK. I was on a big team there and we had similar problems. So it was always in the back of my head. I wanted to do something around making these Jupyter notebooks easier to share. And just in general, making it easier to communicate data science, because that’s what these Jupyter notebooks are, they’re communication tools. Which is the most important part of data science in my opinion. So like, you know the phrase, if a tree falls in the forest and nobody is around to hear it, does it even make a sound? It’s exact same thing. If you gain an insight from data and you don’t tell anybody, did you even gain that insight? Did it even matter?
Kirill Eremenko: True.
Eoin Murray: Communication is the key point. And that’s why this technology is really useful.
Kirill Eremenko: Okay. Got you. And so would you say that that’s the main difference between Github and Kyso, that you can actually, as opposed to like forking a whole repository on Github, you can just read through the document, the Jupyter notebook on Kyso and select the elements that you want or are there any other differences?
Eoin Murray: The big one is that you can choose to show and hide the code for the Jupyter Notebook. So what that means is that, I can be writing an extraordinarily deep document with highly technical code about how to process a piece of data. But then if I write my comments properly and my output graphics look really nice, when I upload it to Kyso showing the code is optional. So if you have the code hidden, the Jupyter notebook just looks like a blog post, it’s just texts, graphs, more texts, more graphs, so you can read it. So a nontechnical person can come along and read it depending on what the comments you’ve written are. But if someone technical comes along and they see a graph or they see a technique that they really like because of how you’ve explained it, they can just click a button and show the code and it’ll show them the code, let’s say, generated that graph or did that piece of processing. Does that make sense?
Kirill Eremenko: Yeah. Very cool. So it’s almost like a conspiracy. Somebody might end up on Kyso by accident and it looks like a regular blogging platform, but it’s in reality, it’s data scientists just having fun.
Eoin Murray: Actually that’s something that we were surprised by and we’ve actually had to work on. So, in the beginning, data scientists were coming to Kyso and they were like, “This really interesting article,” and we will be like, “Did you know what’s actually Jupyter Notebook?” They’re like, “Whoa, no way.” Because it wasn’t obvious enough. It just looked too like a blog post.
Kirill Eremenko: Yeah. That’s very cool. So, what I really wanted to say is, I really like this idea for enabling people to build their online portfolios and presence. For me, this has been, people come and ask questions, how do I build a career in data science? How do I advance my career? How do I get a promotion? How do I break into this field? And my answer is always, “What is your online presence? Do you have projects that you’ve shared? Have you gone and published in a tableau public workbook? Do you have code on Github? Do you have articles on Medium? Do you have articles on Linkedin? What are you doing to share this knowledge, to show people out there that you are capable and the projects that you’re working on? Have you done Kaggle competitions?” And like Kyso in that sense, the way I see it, is an ideal place to go and share those projects that you’re working on in your free time. In order to just have that portfolio, first of all, other people can learn from you and ask you questions and you can explain things and learn it even better.
Kirill Eremenko: But on the other hand as well, so that either recruiters or employers or your employer or your manager, people can actually see that you are an expert in this field and you’re not afraid to position yourself up as one or you’re learning and you’re going to be an expert. Basically. They can see the passion of you putting time and effort into this. And that speaks a lot, like with data science becoming so popular on your side of the salaries going through the roof, there’s a lot of people who want to get in, but the people that make the best data scientists are the ones that are actually passionate about the field, that we’re not just like talking about it. And one way to demonstrate it is through something like Kyso.io. So, I just want to thank you on behalf of our audience that you’re enabling this movement and people to share their work like that.
Eoin Murray: Yeah, no, Kirill. I really agree with that and I think that actually is like a secret weapon that data scientists have is that, and this is really something we want to drive home is that, because here at Kyso you can share with a nontechnical audience. And what we’ve noticed actually is that a lot of the content shared on Kyso is very conversational, right? So, if you have a really nice Linkedin profile, you might get a message from a recruiter who will then put you in touch with the technical recruiter at a company, for example. And the first recruiter might not be a technical person. Right? And then if they’re looking at your Github profile and everything you’ve published looks very technical and cody, it’s hard for them to pass it, whereas with these kinds of notebooks that we see people publishing on Kyso, they’re very conversational.
Eoin Murray: So one study is actually someone who’s used the Github API, to measure and then predict the future of the number of Jupyter notebooks on Github, or it’s things like looking at the GDP of countries versus their democracy index. So seeing how democratic they are, things like looking at GDP per capita versus the Gini Coefficient. So this is all lots of stuff about climate change. How much tons of CO2 per year are going into the atmosphere for different countries? And how was your country doing? And it’s very conversational work. So you actually, you kind of as a secret weapon I think that data scientists have over other technical fields, is that if you do it right, everybody can read your work, not just other programmers. Does that make sense?
Kirill Eremenko: Yup. Yup. From my perspective, as you say, secret weapon, that’s a really the most valuable data scientists are the ones who can bridge the gap between technical insights and the nontechnical business decision makers. And what I’m getting from your description of Kyso is that you can get into the habit of practicing speaking your insights in a nontechnical way or in a conversational way. And I think it’s a very important soft skill that a lot of data scientists miss out on and that but should be focusing on developing. Because for me in my career, I’m by far nowhere near the top data scientists in the world, but at the same time, I find I can actually explain complex things in a simple manner. And that’s what helps me get ahead. And I wish that to as many people as possible. So if you can practice that in a setting like this, I think that’s a really cool thing.
Eoin Murray: And I definitely agree with that point because I really think it is a learned skill. It’s not that you just wake up someday as a good communicator, it’s practice. We publish a lot of fun studies on Kyso and in the beginning they would, I don’t know, 500 people would read them if I post them on Reddit, that is beautiful. And then we learned about how to make the graphs nicer to look at, more interesting and simple to look at because people will comment and they’re like, “I don’t understand this or I don’t like this.” And you just get better at like, have picking a proper title, the proper amount of description, not too much to make it way too detailed and a little bit dull, not too little that there’s nothing to bite on.
Eoin Murray: Having the right amount of graphs in a report for example, maybe you should, we kind of, it’s like between one and three. Makes a lot of sense and you’ll get a lot of readers as we learn, we actually ourselves have learned this skill in the last year of Kyso, where in the beginning you’re only getting 500 people reading it and now you get 25000 people reading an article. And it’s just like you posted in the same place, actually, this is maybe something that your listeners might find useful. So we have to learn ourselves in the beginning, like, “How do I actually share?” Because if I’m on Linkedin and I have a hundred connections, I’m on Twitter and I have a hundred people following me. I can host my report or my article on Kyso but how do I go about actually getting people to read it? How do I go from maybe a hundred followers to lots more?
Eoin Murray: And what we’ve learned actually is Reddit, the sub reddit data is beautiful. It has 13 million people reading it. And it gets about 25 to 30 submissions a day. We’ve noticed that if something is good and people would come along and comment on it, that a lot of people will actually read it. On Kyso you can see the amount of views, you get as well still get some analytics about your post. So if people are like, people are listening and they’re figuring out a place to suppose to work to get readers, data is beautiful as a really good one. And the hacker news is obviously good one as well. It’s a bit more hit and miss. But out of every five posts you publish, maybe one will hit the front page and then you’ll get a lot of readers and that.
Eoin Murray: And one thing we’ve noticed as well, is that like if you rank high on data is beautiful or hacker news or data science or like as well, the point to make is that if say somebody is to read it, it’s pick a topic where your graph is interesting. So if you like write an economics article, you look at the wealth per health household of lots of different countries, right? Postdocs or economics and we’ve noticed that if it’s a good thing it’ll get ranked highly and people will share it in other places, and before you know it your post is cascaded onto like, then there’s like a hundred people tweeting about it. It’s on hacker news as well, someone else has posted about it.
Eoin Murray: So that might be something that your listeners might be interested in. If they’re thinking of how to build a portfolio it’s just like, write about six or seven articles and then just post them to like about four different places. Don’t do too much, don’t be spammy. But if you do that every now and then, maybe you’re publishing an article every week or two weeks and you do the four steps for each article, you’re definitely going to start getting readers.
Kirill Eremenko: Got you. Got you. Very cool. I wanted to ask you on the flip side, let’s say, to your point earlier, when I’m inside an organization, I’m a data scientist, and I’m working on a project or our team is working on a project and we know that we will probably need to replicate this on a monthly basis, but with some alterations and some new changes, developments and so on. Can I use Kyso? Is it safe to upload projects with company’s specific information with maybe sensitive data and things like that, because of course it’s valuable in the public side. But what about inside a company?
Eoin Murray: Yeah. Yeah. Cool. So maybe there’s two points there. I’ll just reference the one about reusing work. So in Kyso you can fork everything so for example, I want to look at, if you have a study about the carbon emissions of Germany for last year and I’m like, that’s amazing, I want to see that for my country Ireland, I can press the fork button, I can actually open that up. And a point to make is we recently launched it, so actually you can open up a Jupyter notebook server on Kyso so you can actually play with the code or you can download the notebook and run Jupyter notebook locally and then publish it. But you can download an existing study, swap the data in for say Ireland versus Germany and just republish that.
Eoin Murray: And the fork is track. So it’s really, really cool way to reuse work. So that people can expand and extend each other’s work and remix stuff. And then to your point about internal, yeah, so about a month ago we launched Kyso for teams, which is basically the full price of stack, ring fenced only to a private environment for teams, where you can share sensitive graphs and stuff that you don’t want the public to read obviously. And you can have permissions controls. So for example, I can make a team on Kyso, and then I can add other editors. So these are people who are allowed to publish to that team’s scope. And then I can add viewers and the viewer permissions being people are only allowed to read stuff and comment on stuff and they’re not allowed to have submissions.
Eoin Murray: So this is useful than if you’re just trying to, maybe run a reviewing process, where there’s a limited amount. So some people want everybody to be able to post everything. Some people want to restrict that. Some people want to review work so that Kyso acts as an internal journal as opposed to like a blog where you post everything. So yeah, it’s completely suitable for that purpose. And we’ve companies now are using it a lot and it seems it’s really, really useful.
Kirill Eremenko: Got you. And I’m just looking, so definitely that’s a very, very valuable feature. And is a corporate subscription type of offering. And what I want to talk to, I’m just looking through the Kyso.io, can you help me out. How do I, let’s say you mentioned like the German study, is there like a search button where I can search for a specific study that I’m after because I don’t seem to see where to do that.
Eoin Murray: Oh yeah. So right now we have tags and we get a lot of questions about that. Our search functionality is coming really, really soon. We’ve been working on it. And they’ll be a big search bar there where you can get everything on our to do list.
Kirill Eremenko: When did you start Kyso.io?
Eoin Murray: So we started it about a year and a half ago. But we did a big pivot, about six months ago, which is why you see now as the current iteration.
Kirill Eremenko: Well, it’s very impressive for something that’s only a year old. It’s really cool. So yeah, for listeners, if you’re interested, it’s Kyso, K-Y-S-O.io. By the way, with, where does the name come from?
Eoin Murray: So we used to play this game. We used to ask like investors or just anybody who would ask us, I’d be like, “Look, I’ll give you 10 points or I’ll buy you a beer. If you could tell me what Kyso means.” And people would spend a month googling and trying to figure it out. And it doesn’t mean anything. It’s a four letter domain name that we were able to buy narrowly. And it sounds kind of catchy. Also in the very, very beginning Kyso was, we started out as a command line tool, to turn on and turn off Jupyter notebook servers on AWS. And because it started as a command line tool, we wanted the command line to have the same name as the website, like Gifs or some of those. So we really wanted to have a four letter word or even three, but that’s impossible.
Eoin Murray: And it had to be easy to type as well. I don’t know how to say it, but there’s a flow sometimes when you’re typing a word all the time, you want to be able to maybe typing with one hand or you don’t want the letters to be too, like you don’t want A and P and Q and M or something, they’re too far away on the keyboard.
Kirill Eremenko: Yeah. By the way, that’s a really cool tip for it was speaking of startups and people wanting to get into the space. Like that’s the same approach I’d take when you were starting a new business and first thing you do is you go and check for the domain name and then from what’s available, then you pick out the name of your business pretty much. That’s because the domain name is important, right? Has to be memorable.
Eoin Murray: Yeah. I mean I think as well, if it’s a tool, you have to have the name tied to the tool.
Kirill Eremenko: Yeah, true. Okay, cool. So that’s Kyso.io, everybody who’s interested make sure to check it out. Upload your projects there and Eoin tell us a bit more about yourself. Like you’ve got a really cool, interesting background, not to even mention the quantum computing with lights and things like that that you’ve done, you’re a serial entrepreneur and things like that. What are some of the other things that you’re interested in these days?
Eoin Murray: So one thing I think is very interesting is to think about the evolution of data science as a subject. Not enough necessarily as an industry where you process data and present it at work, and make decisions there, but how it will, I think influence the wider way a society is processing information. So, a few years ago, right, before you had a smartphone and Wikipedia, you could be at a bar with a friend and you’d start arguing about some trivial fact and your friend has a different opinion.
Eoin Murray: “What’s the population of France?” Right? And you’d be like, “20 million, but like it’s 120 million.” And we’d go on for ages and only like the next day would we actually be able to check, right? And people don’t really have these kinds of discussions anymore because you’ll just Google it. Right. So what happens there is that kind of discussion now is that like single point facts are trivial to check. And that’s changed the types of discussions you’ll have with people. And I think what data science might do is like the same thing, but for more like multidimensional facts. Does that make sense? So it’s before, “What’s the population of France?” Now it’s like, “How is the population of France changed in the last few years? And how it’s going to evolve in the future?” Or a question would become like, “How’s the population of France change and how its demographics shifted?” Or “How has that that population change related to its economic growth performance for the last few years? And these are going to be things that are just more widely known by people. Does that make sense?
Kirill Eremenko: Do you think there’ll be in part also enabled by assistance like the Google assistant and an Alexa and so on where they just can do those predictions for you.
Eoin Murray: Yeah. I think that’s going to happen. Like right now, if you’ll check like a single factor in Wikipedia, soon enough we’ll be getting charts and graphs and under discussion we’ll change towards having more and multi dimensional view of things. And I definitely think it’s going to be parked on. People will demand this kind of stuff because data journalism is exploding where you don’t have to go interview politicians or go out into the field to discover something. You can just process data that exists. So the discussion even in the news today, you see more and more charts posted.
Eoin Murray: And I definitely think, I think Siri and stuff you’ll be asking, if was paper published in France, it won’t tell you a number. It’ll show you a graph for the last five years.
Kirill Eremenko: Got you. And the other thing.
Eoin Murray: Oh, I think what’s very interesting topic for discussion, and I’m nowhere near an expert on this, but it’s like the ethics of data science and AI. How they’re going to be going forward.
Kirill Eremenko: Okay. So what are your thoughts? How are they going to be going forward?
Eoin Murray: So I think it’s very hard question. So, what’s that show, is it Little Britain? Where you’re trying to get your driver’s license and the person behind the desk just says like, “The computer says, no.”
Kirill Eremenko: No, I haven’t seen that.
Eoin Murray: Computer Says No. Because sometimes I think that like this tendency of people to think that the computer is like an objective system that gives you like an objectively correct answer about something. Right? Whereas an actual effect, a computer or an AI system, it just like reflects the biases or the input it was given or the decision making capability it was given. Right? So you see that an AI it can be very biased towards and against certain groups of people or certain types of behavior. Does that make sense? And I think you see world governments all over the place now saying because a big issue with neural nets is how knowable it is, right? So maybe there’s a nightclub and instead of having a bouncer, it has a facial recognition. And then it doesn’t let me in. Right now it’s very hard to ask a neural net why you didn’t let Eoin get into the nightclub.
Eoin Murray: And I think making that transparent and knowing why the AI made that decision and then like being able to ask you to try and make a decision again or be able to like escalate your problems until you’re talking to a human, it’s something that’s very, very important.
Eoin Murray: You imagine this is an important thing to have. And I think I’m a bit worried. I think some people are worried that we’re actually going to have this system where we just let the AI make all the decisions and there’s no transparency into it or able to escalate, to like petition a change in that. Because I think it’s an amazing technology, but we have to remember how it’s implemented and understand how it’s implemented, how it affects different groups of people.
Kirill Eremenko: That’s a whole discussion about interpretable AI. On one hand you can make AI more interpretable, you minimize that problem on that, but at the same time you lose inefficiency, right? Like, the less interpretable it is, the less there’s restrictions and boundaries for what can be inside in terms of implementation.
Kirill Eremenko: And that just means a more variety, more opportunities for artificial intelligence. Yeah, it’s a tough topic at the moment. Right?
Eoin Murray: Yeah. It’s like if you’re learning data science, you have to learn the skill, but also like in a part of like the philosophy around it.
Kirill Eremenko: Yeah. Got you.
Eoin Murray: Sometimes you can have this thing where you think you’re going to make a vision system to analyze cancer data and it could get used in a weapon and maybe how to [inaudible 00:47:49] with that. Or maybe you [inaudible 00:47:50], I don’t know.
Kirill Eremenko: Yeah. Got you. What’s your stand on data science being a certified profession? For instance, accountants, they have the chartered accountants or in finance they have certain exams that they need to pass, lawyers need to be certified in order to practice law, that’s like, probably the clearest example is, you cannot be somebody whose lawyer unless, especially in certain circumstances unless you have a certification or yet to get bearish. What do you think should data scientists and people who develop AI, should they be required to have certifications?
Eoin Murray: I’m not sure, like on the question of should it or shouldn’t it? I’m not sure on the question of will it? I don’t think so for the simple fact that I think in like it’s going to become like a skill that everybody has in 10 or 15 years. You know what I mean? So it’s to restrict it in that way, I don’t think it’d be feasible because I think you’re going to have, now we’re seeing at the forefront of people in education of data science. Right? And then, you’re helping people get into the industry. I think in 10, 15 years, like it’s going to go from maybe there’s six or seven million people today learning data science, it’s going to be 120, 130 million people in 10 years. It’s going to be very hard to implement some regulations or certification system around that, you know?
Kirill Eremenko: Yeah. I see what you mean. Actually this question popped in my head. Like now being an entrepreneur and having started a business or your second business now. You mentioned you were back in the day, in another life, you’re doing quantum computing and data science as I imagine, do you miss it? Do you miss being in the field and actually doing data science as opposed to entrepreneuring?
Eoin Murray: Yeah, yeah, I do. Sometimes it comes into my head like, “Oh, there was this beauty around that.” That I would have some data I can’t explain and I would have to read a book about how to simulate some system and another book about how to like actually do the data science of that simulation and I would then apply it. But then what motivates that that was a very beautiful scientific process and it’s very satisfying to do that. When you see that you’ve built a model and nine times out of 10 it doesn’t work. But when it works you’re like, “Oh my God, that’s amazing.” That’s such a great feeling.
Eoin Murray: What motivates me now though is that I think that, it’s I was one scientist, if I can make, if I can make a thousand scientists 5% more efficient in the way they work, the overall impact is just so big. But there’s definitely something beautiful and satisfying about how when you have a lot of data coming in and you process it and you analyze it and you then finally fully understand it. It’s like you’ve taken this mess and ordered that system in a way that you can understand it and then give that understanding to other people. That’s a very, very satisfying process.
Kirill Eremenko: Okay, cool. Do you have any examples from your past life of interesting projects that you might be able to share with us?
Eoin Murray: Yeah. So there’s one project I was advising on, which was using micro fluidics and photonix to try to identify contaminants in water. So E.coli and other bacteria like legionnaires for example. And what we did was there was a cracked ship with new pipes in it and we would contaminate some water with gloves obviously, and we will put the water through these little pipes and we’d shine a laser at it. And then depending on the… So every bacteria, so the laser will hit the bacteria and it would reflect, and you’d measure the reflection in a spectrometer, so you’d get a histogram of the wavelength of the light versus its intensity. And every piece of bacteria had a very specific spectrum. Like it was a unique identifier, right? We wanted to come up with an automated classifier so a robot could tell you what it is.
Eoin Murray: I’m 90% sure this is E.coli versus legionnaires, you needed to know the specific bacteria, not just the existence of bacteria. So we used a support vector machines too, so basically we just did lots of repeated tests of taking spectrum on the two different bacteria or the fewer the five or six, and then use a support vector classifiers to be able to run them through a model. And then it would just tell you what it thinks it is. And you know, when we started the project, we were only getting 50% probably success rates, which is not great because it’s effectively random. And then after about six months of just tweaking the way the data was processed, we couldn’t exactly learn support vector machine algorithm. So we actually just ended up like a log 10 times formation, made it really, really accurate.
Eoin Murray: We were getting up to like 99%. So that meant that basically it was the beginnings of a system where you could run water through a pipe, shine a laser at it, gather the latest laser spectrum, and it would be able to tell you if there was bacteria present in that water and then within a group what kind of bacteria that was.
Kirill Eremenko: Wow. That’s very cool. Did that get implemented anywhere?
Eoin Murray: It was a research project, but, and this is about four years ago we did this and I think they’re doing some small field trials now. I mean, there’s a lot of work in getting all of that system package.
Kirill Eremenko: What I find interesting about this is that you, well first of all like why did you select SVM? What was the decision for that, if you remember? And the other one was like, you selected SVM, you got a 50% accuracy, but you’re still stuck with support vector machine rather than switching to a different model and you got the end result that you wanted. So just curious about thinking behind that.
Eoin Murray: I can’t remember specifically why we chose it. I think there was like a team standard of using that library. So I inherited that a little bit and the 50% was basically, I think it was, so a lot of it was to do with how the data was prepared before it went in. So what it was, was that the signals were so similar in intensity after being normalized that there was like you have a lot of these different peaks in the histogram and that basically there was maybe 45 unique indicators of a bacteria, but then there was only two or three which would tell you between two different bacteria. So you had to like amplify that difference, which is why a log term transformation would do. It would make that look bigger.
Eoin Murray: So like multiply everything by a billion or something and you are different areas and you’d see that because it’s like yeah. I think that’s basically it. It’s basically that the difference was quite small between the identifiers that you had to somehow make the space between them bigger to separate them out.
Kirill Eremenko: And the log 10 transformation did the trick?
Eoin Murray: Yeah.
Kirill Eremenko: Yeah. Got you.
Kirill Eremenko: Okay. Well, interesting project. Hopefully that rolls out and helps people in their lives. Well on that note, that actually brings us to towards the end of this podcast. Really cool to hear your insights. And of course, the work that you guys are doing at Kyso.io. Could you share some links [inaudible 00:56:30] where they can get in touch, follow you, maybe ask you some questions and just see where your career takes you.
Eoin Murray: Yeah. Super. So, I mean, Kyso.io is Kyso, K-Y-S-O.io. Anybody wants to ask me a specific question? You can get me by my email. I’ll respond pretty quickly. eoin@Kyso.io. And I’m also on Twitter that’s Eo_in and I love when people send me datasets and I see if I can visualize them. Some of your fun thing.
Kirill Eremenko: All right. Be careful what you wish for you’ll get like 10000 datasets after this podcast.
Eoin Murray: I would then, I can do an interesting study on, come on this podcast and then what kind of data sets gets send to me. I can tell you a lot about your listeners.
Kirill Eremenko: Oh, true, true. All right, cool. And also LinkedIn is okay for people to connect with you there?
Eoin Murray: Oh yeah. Super. What’s my Linkedin unique code?
Kirill Eremenko: We’ll add it to the show notes.
Eoin Murray: Super. Yeah. Happy for that.
Kirill Eremenko: Awesome. Okay. Well, one more question actually before you go, is there any book that you can recommend to our listeners that has helped you in your career?
Eoin Murray: Yeah, there is. So I learned data science by doing during my physics career, but a lot of data science fundamentally is just linear Algebra. So I think I’d recommend, this is a very difficult book, but if you can read the first chapter of it, you’ll definitely walk away with a lot more knowledge than when you went in. And it’s a book called Quantum Computation and Quantum Information by Isaac Chuang and Michael Nielsen. And I wouldn’t really recommend the whole book. It’s like the bible of quantum information. It’s a very, very big book, but the first chapter of it is by far the best introduction I’ve come across to linear Algebra, which is an advanced step in data science, but it’s very, very useful.
Kirill Eremenko: Okay. Got you. Quantum Information and Quantum Computation, right?
Eoin Murray: Yeah. By Chuang and Nielson.
Kirill Eremenko: By Chuang and Nielsen. Perfect. All right, Eoin thanks so much again for coming on the show. Sharing your insights and keep up the great work you guys are doing with Kyso.io.
Eoin Murray: Thanks so much. Thanks for having me on.
Kirill Eremenko: So there you have it ladies and gentlemen. That was Eoin Murray from Kyso.io. I hope you enjoyed this conversation as much as I did and got some valuable takeaways. For me, probably the most interesting part was the whole conversation around startups and accelerators, different types of investments and what you get out of these programs that you can participate in. I don’t know if I’ll ever be in one of them, if I’ll ever apply, but it is just good to know this whole world because startups are on the rise. There’s so many interesting things happening in the startup world. So like I got a really good share of knowledge from that positive conversation. And of course needless to say, the whole concept of Kyso.io. The tool where you can share your data science projects. I’m very grateful Eoin’s looking into that and it’s really cool also to see that the base level of pricing on the platform is free and as it says there right now, free forever.
Kirill Eremenko: So that’s very admirable that they’re creating this tool for us data scientists to actually share our work and experiences. And I look forward to seeing how it’s going to develop. So in its first year of existence they’re already so cool. So I can only see like a bright future ahead for it.
Kirill Eremenko: On that note, you can get all the show notes for this episode at www.www.superdatascience.com/263 that’s www.superdatascience.com/263. There you’ll get all the links that were mentioned on this episode, a URL to Eoin Linkedin and other social media we can follow him and connect with him, plus a transcript for this episode and anything else that might be required in order for you to get the maximum out of this podcast episode so check it out. On that note, thanks so much for being here and I look forward seeing you back here next time. Until then, happy analyzing.