If you are at the start of your data science journey, one of your first decisions is to choose R or Python as your scripting language of choice. The general consensus is to learn both when you need to. Use R for what R does best, and Python for what Python does best.

What about deciding to introduce R or Python at your workplace? Firstly, why would you want to apply data science on the job? Showcasing your data science insights can place you as the advocate of data-driven decision-making in your company. Developing your skills during work hours and after equates to you progressing further along. But do remember to hug a tree or pick leaves, or whatever people do outside.
Below are some considerations when selecting R or Python at the workplace. Your choice will be practical, though often you could be nudged or pigeon-holed in the way you work.
Use what is available at work
I’ve heard the following from colleagues in the past: ‘”IT won’t let me install R/Python at work”. Your IT department may carefully control the programs installed on your work machine. I am sympathetic – IT has valid risk-mitigation reasons why, but it creates a barrier to exploring open-source tools.
If you are unable to install R or Python, then you have to make do with permitted analytics tools. Your workplace may use SAS, SAP, SPSS or even Matlab if they prefer to pay for software that doesn’t begin with “S”. Currently, the data science career-path requires R, Python or both. In the future, we will see the rise of data science GUI platforms.
Installation of R is generally accompanied by the installation of the fantastic IDE RStudio which makes using R easier and a delight. Everyone uses RStudio with R, except that one weird guy that likes to make life more challenging for himself. Install Python with Anaconda, an open source distribution of Python (and R) which simplifies package management.
If you cannot install open-source platforms, or your boss simply says no, then explore on your own machine with your own datasets on your own time for now. We all want to work for a company that embraces data science, and I have had former colleagues resign when this desire was not met.
make your pitch
Sometimes it’s not enough to just say there’s a better way of doing things, you have to show people. Find someone, preferably your manager, who can back your data science approach for a project. Impress them first with R/Python, then they can invite others for you to present your work. Show how you can gather data, wrangle it, analyse, model and present insights. Bonus points if you can automate the process. The pitch is to save time and money, generate insights and automate reporting.

Build a prototype. Before I present new methods and techniques, I build a small prototype that can show stakeholders how the new initiative could work. When working in a role that did not embrace data-driven decision-making, I would build prototypes in Excel using VBA to organise data and apply logic. The VBA prototype read-in updated data and ranked clients on “priority of need” based on demographic and circumstantial features. I impressed my audience of non-technical staff – watching in real-time how a program could automate the data organisation process. Now I combine R and visualisation software Power BI for prototyping. When you are making your pitch, emphasise a clear message from your functional prototype – there’s a way to save time, effort and error. I should add – don’t use VBA for your data science needs. Choose R, choose Python!
Sadly, there’s no guarantee that your work is well-received. A former colleague of mine presented results from a market basket analysis, highlighting which online products customers purchased together (use the “arules” R library for market basket analysis of transaction data). She had uncovered some interesting associations that could improve online deals. The senior asked, “How do you intend to use these results?”. She replied that we could redevelop the store website – when a customer places an item into their shopping cart, the page could suggest the frequently co-occurring purchased product as the customer continues to shop online. The senior said, “That won’t happen anytime soon”. Web development requires resources that she was not in a position to ask for, even though it was an obvious implementation that could drive sales. Instead, her analysis could produce a report showing merchants what product purchases co-occur at a given time. The analysis was potentially useful, but her desired website solution could not come to fruition. Manage your expectations – a report isn’t so bad. It still conveys your data science message.
Use what your colleagues are using and will be using
Your teammates will influence the choice between R or Python. If your colleagues use Python, and you’re bringing R to the coding table, learn Python. Speak the same language as your colleagues.
Consider the colleagues you don’t have yet. I am currently writing R code to wrangle data to be presented as dashboards. I have thousands of open-ended comments I’d like to organise into “topics” then quantify the comments per topic. I did not have the time nor tenacity to read through each comment and show which topic a given comment belonged to. Instead, I applied a topic modelling approach called “Latent Dirichlet blogAllocation” (LDA) to organise the comments using Python. My manager questioned if I could code LDA in R instead. The answer was yes, there exist example code for both Python and R. “Why do you prefer I code it in R?” I asked. He replied, “There’s no need to introduce another language into the pipeline if you don’t have to”.

From my personal perspective, since I am learning Python, it would have been great to build a new Python data product. From a business perspective, it wasn’t a great idea for two reasons. The first was that when we explain my data pipeline to others, they may challenge the multiple languages. Simplicity is key for production. The second is that I will be passing the knowledge of how the data pipeline works to a colleague in the future. I can save this person time by limiting the number of languages and tools I use.
Use what comes more naturally
It’s overwhelming when you’re starting out in data science. R, Python, SQL, Tableau, Power BI, Hadoop, Spark (and so much more). The path through a new field should have the least resistance. A common adage is that those from an academic or statistics background may prefer R, those from computer science or engineering may prefer Python. That’s certainly true for me. I used the numerical computing environment Matlab during my PhD research. Years later I took an online data science course that used R and I have primarily used R in the workplace since. The first programming language choice can set the seed for growth.
R came naturally to me because there were similarities with Matlab. For example, both can use vectorisation. Consider this R bloggers post showing three different methods in R of replacing values in a data frame column. It is tempting to use a for-loop to iterate through each row and replace the value. A faster alternative is to apply vectorisation using a lookup vector. The voodoo of vectorisation may not suit most. I admit, I still use loops when I probably shouldn’t because they are easier to interpret and code when sleepy.

Use the language that is more natural to you, and you’ll come to understand data science in the language of R or Python. Once you have an understanding of data science, you will be able to pick up the other language at a later date. “Machine Learning A-Z” here at SuperDataScience use both languages. I can leverage my understanding of R to Python since the data wrangling and data science principles are the same.
Use what is easier
I am making a distinction between what comes naturally and what I mean by “easier”. In this context, when playing with data science it’s common to apply approaches that aren’t necessarily the easier approach because you may not be aware that an easier approach exists. I was tasked with building dashboards that could be distributed to staff across my company. I initially coded-up Shiny dashboards with R. I built some really cool custom dashboards and got stuck into the zone of creating ggplot2 visualisations. However, it was time-consuming and challenging to share the dashboards to users. A colleague (the same colleague who performed the market basket analysis) suggested I try building dashboards with Power BI.
It was so much easier.
I still plot with ggplot2, but I haven’t built a Shiny dashboard since. There probably is an easier way to achieve your goal via another approach and hopefully it’s available as open-source. By asking around and exploring new techniques, you can be pointed to the easier solution that will save time and effort.
The argument is the same with packages and libraries. If a Python library can perform an action better than an R package, use Python, and vice-versa. Choose easy.
Since I primarily use R, I wanted to know what Python does better than R according to Python users. I didn’t conduct a survey, so I went to the source of information on the internet: Reddit (and Quora – the scrawny little cousin of Reddit that makes you register before you can read posts). As I discerned from users’ comments, Python is easier to learn than R [1, 2] with data analysis and machine learning facilitated with “pandas” and “scikit-learn”. Python is a general purpose object-oriented programming language, whereas R is a complex programming environment one needs to master [3]. Being general purpose, users may already be familiar with Python when coding websites and other apps whereas R is focused on data analysis. The Python ecosystem is better for production as opposed to one-off use cases [1]. Thus, the Python language can be used for development and analysis, relieving the need to switch between languages when interfacing between different parts of the project, reducing overhead [3]. Data scientists use the best tool for the job – R can be used for prototyping but Python is for production in the real world [1, 4, 5]. Python is generally faster than R [6], at least when using for-loops [7]. Python beats R when it comes to APIs [1, 8], scraping web data [9], natural language processing [6, 10] and for deep learning [6]. That’s not to say these tasks are not possible with R. Both R and Python have become well-rounded languages/environments for data science, with the perceived weaknesses in each being addressed with packages. Thus, some of the user statements I paraphrased here may have become out-dated with subsequent R and Python improvements.
Conclusion
If you can, make a case for data science in the work place. Most industries are hungry for insights from the mass of data being gathered and stored. If there is someone at work that’s driving the data science initiatives, then find out if and how you can contribute. Otherwise, consider how you will become the person who introduces data science to your workplace.
References
- R + Python vs R + VBA vs R + Java
- Which is better for data analysis: R or Python?
- Python Displacing R As The Programming Language For Data Science
- R vs Python for Data Science: Summary of Modern Advances
- Difference in usage of Python vs R for data science?
- R vs. Python
- Is Python faster than R?
- Learning R vs. Python, or both?
- Python or R for new language to learn? Or should I be working on learning another language?
- Difficulty of learning R after python and use of both?