TL;DR Popular tools include Python (Pandas, Seaborn), Excel, R, Databricks, Snowflake, AWS Batch, EMR, and specialized platforms like http://siren.io.
Data Manipulation and Analysis
Python is a staple in data science for data manipulation and analysis, with libraries such as Pandas and Seaborn being frequently used [3:1]. Excel remains a popular tool for simpler tasks or when sharing results with non-technical stakeholders
[3:2]
[5:2]. R also has its place, particularly among those who appreciate its statistical capabilities
[3:5].
ETL and Data Warehousing
For ETL processes, Mage was mentioned as a useful tool [1:4]. Databricks and Snowflake are highlighted for their user-friendly interfaces and powerful querying capabilities, making them favorites for handling large datasets
[4:1]
[4:3]. These platforms are praised for enabling efficient data processing and collaboration.
Cloud Computing and Big Data
AWS Batch and EMR are essential for distributing computing tasks over clusters, especially when working with big data frameworks like Spark [5:8]
[5:10]. These tools facilitate running jobs in parallel, optimizing resource usage, and automating workflows. They are integral to managing large-scale data operations.
Specialized Platforms
http://Siren.io is noted for its ability to rapidly develop data models and understand feature relations [5:1]. This platform, along with http://cogility.io, helps advance analytics by providing insights into complex data structures. Such specialized tools can significantly enhance the depth of analysis and model stability.
User Experience and Accessibility
Tools like Databricks and Snowflake are recognized for their excellent user experience, making complex data operations more accessible [4:1]. Vscode, with extensions like Jupyter and Copilot, is appreciated for enhancing notebook programming
[4:1]. The ease of use and intuitive interfaces of these tools contribute to their popularity among data scientists.
I get the sense that this list is more about maximising the diversity of tools than the actual practicality and value of it from an organisational perspective. The comments confirms that.
how would it be more useful to you? these are all types of tools I use a machine learning engineer
Well first you'll need to define what "best" is, set a list of metrics for which each tool is scored upon, and an overall weighted score.
u/alexellman
Mage - ETL
Polars - Data Manipulation
Folium - maps
Gotta have tidymodels and timetk here
That’s awesome, thanks!
R?
I don't think so...
I don't think so ...
Modern data science tools blend code, cloud, and AI—fueling powerful insights and faster decisions. They're the backbone of predictive models, data pipelines, and business transformation.
Explore what tools are expected of you as a seasoned data science expert in 2025
Um ... Microsoft?
SQL, Python. And Excel to publish the results to outsiders.
Pandas profiling reports
Google search and Reddit search, because I’m absolutely sure we just had a thread about this recently
I actually googled it but couldn’t find anything recent. Maybe over a year ago and not what I was thinking?
R
Depends on the amount of data and how complex the analysis and if it’s a one time thing or needs an automated output.
Needs any automation - probably Tableau. Try to give it to the BI team if it’s a straightforward ask.
Not very complex, no automation, can open the dataset in Excel - then probably Excel
No automation but big dataset or needs cleaning or feature engineering or lots of exploration - Python, mostly Pandas and Seaborn, in a Jupyter notebook.
Data Science community, I've got a question for you:
Which data science tools do you find most user-friendly?
I just went live with a project I've been working on. I feel like the configuration process is easy but would love to compare it with some of your favorite data science tools. The project I'm working on is a simple cluster compute tool. All you do is add a single line of code to your python script and then you're able to run your code on thousands of separate VMs in the cloud. I built this tool so I could stop relying on DevOps for batch inference and hyperparameter tuning. At the moment we are managing the cluster but in the future I plan to allow users to deploy on their own private cloud. If you are interested I can give you 1k GPU hours for testing it :). I honestly wouldn't mind a few people ripping everything that sucks with the user experience.
Anyways, I'd love to learn about everyone's favorite data science tools (specifically python ones). Ideally I can incorporate a config process that everyone is familiar with and zero friction.
Project link: https://www.burla.dev/
I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)
Databricks, snowflake, vscode with the right extensions
Why do you like these tools? Also, what vscode extensions are you using?
Databricks has an amazing UI that makes it really easy to share compute. Snowflake has the best querying tool I've ever seen and they are constantly making strides to be the king of the big data space. Vscode with the Jupyter and copilot extensions name notebook programming a bit more enjoyable.
tidyverse, duckdb, quarto. Haven't tried GitHub copilot, but it looks amazing.
And from a user experience perspective why do you like using them? They are easy to use and deliver value?
Define easy to use and to who. To someone who’s unfamiliar with programming they’ll be much harder than a GUI based solution. But to someone who knows these they’ll be much easier. Similarly someone who can program will learn these easily. Someone who doesn’t but has the drive to learn, will also find these easier once they get over the initial learning curve. Context matters.
Your project sounds like Metaflow to me, which is what we use. Seems kinda excessive that you guys are building it from the ground up when there are already good tools out there.
Could you tell me more about the other tool out there that you think address this problem? I want to see if it is aligned with my research.
It is based on kubernetes and it allows you to essentially define a flow, such as a series of methods that metaflow calls steps, and run it directly on a kubernetes clusters with whatever parameters you choose. This means that with a simple python decorator, our team can esentially start a thousand training jobs in parallel.
It even supports defining exact resources for each step in your flow, so you can allocate more memory to data fetching and preprocessing and only attach a GPU when you arrive at the step/method that starts training
i've used coiled before - kinda frustrating to get used to
Most data scientists spend 2-4 years as a DS and gets promoted to senior positions. During this early period, what are the common tools other than doing software engineering using python that you use in your work? What are the best tools nowadays which help build stable models in production?
Excel, and selectively Pandas.
Seems excel is winner forever 😀
Excel with a sprinkling of R.
Seems excel is evergreen
Excel.
Most crucial functions/vba modules?
AWS Batch and EMR jobs using docker containers
Sounds more data engineering-heavy to me!
We do everything end to end in our team
Distributing work over clusters to run in parallel. It's commonly used with Spark, idk about other frameworks but pretty sure there's a wide support.
Elastic Map Reduce. Allows you to distribute computing over a cluster of machines. We use it to run Spark mostly.
The ‘job’ refers to us using job queues. We build a job pipeline and then execute it when needed. AWS takes care of getting the compute resources and then shuts them down when the job is finished
For me, the biggest leap in DS was finding good tools to build data models, I love siren.io for this. It helps rapidly develop an understanding of the features and relations between different data structures. Once you have a great model, it’s possible to advance the level of analytics using a platform like cogility.io.
I'd like to get an idea of what the popular tools are for data analysis. If you use or like a tool not on the list please provide a comment and why.
For some context, I work at a small engineering firm. We have a mix of technical engineers that would be open to more modern tools like Jupyter, but also have a number of engineers that will take Excel to the grave unless convinced otherwise (mostly because they know it). I'm trying to find a tool that can provide an modern alternative to Excel for number crunching / visualizations and is more inviting than local notebooks but not limiting to the more technical staff.
I put Databricks on here because I know its a popular platform, but wonder if it is a little overkill for the volume of data (gigabytes) we are analyzing at the moment. Also, I feel Databricks/Spark adds more complexity (if my assumption is wrong, please correct)
Also, project collaboration and integration with central resources would be a big plus.
Libre Office / Open Office Calc ( Spreadsheet)
Hex all the way! Sql & Python combined in one notebook
I currently use Alteryx for Data Analysis, but Hex looks really good. I was excited to see that it even has functionality to load CSVs, as much of my data comes to me as Excel spreadsheets. I love the cells, as they are so similar to the tools I am used to working with in Alteryx.
The one thing I would miss is the graphical display of my flow. I had a project that lasted 2+ years with 4 constantly changing data sources and 8 occasionally changing data sources. That visual representation was necessary then. I would probably resort to post-it notes and yarn with hex 🤣
They do actually have a graph view . It doesn’t really highlight the sources but it shows which cells are dependent on other cells
Sql + any sql client would be my vote.
Yep this is the answer for me too, writing a small script and running it via your preferred client I find to be far faster and more efficient than exporting data to analyze elsewhere.
R or Python ... SQL to query your data for analysis. Are you all using SQL to analyze your data?!
Depending on the size of your data you may want to run your analysis via Databricks.
A lot of analysis is counting stuff within groups. I always start by using squeal - usually within a spark framework.
That makes sense! When you say "spark framework" what do you mean by that?
This is the way.
Sql + Excel
Hmm no aws tools
Very Azure-centric. Weird list.
Don't love this list.
Just 50? :p
Just know then. Not be good with them.
I know right? A chunk of these are flawed BI startups.
Was having a conversation with a colleague about the breadth of libraries available in our work which made us both wonder: What are some tools that are missing from the data science ecosystem?
A report documentor. Goes through the data calls out the routine key points and draws pretty chart so we can focus on the deep dives.
Something that’ll take SQL queries and convert it to DAX
A good data annotation software where multiple people can label and annotate data collaboratively
Have you seen maxqda? It's kind of expensive, but many researchers use it. I think the main purpose was supposed to be for qualitative data, but I've seen it used on data sets,
Some datasets are sensitive in nature and you wouldn’t want people outside of your company to view them though.
Google earth engine is amazing! Wish there were more platforms like it for other datasets too
I like it, but I can't use it in my company as I believe it doesn't with gitlab.
I like it, but I can't use it as I believe it is not available for gitlab.
Access to the access management platform that grants access to my access management platform.
We have that!! But have you submitted for access to the access management panel behind the access control with the access administrator? They usually granted it after the access meeting. You just need access to attend first.
Such a timely request. Here’s what I recently built. You can connect your notebook to a GitHub repo and the tool will automatically version every modeling iteration and show the code diff in GitHub.
This is great! I recommend looking on Ploomber(https://ploomber.io ), for a more robust integration between the two. It has native integration with Git and it does some of this work for you, the Jupyter plugin allows you to read as a notebook and it allows to actually track the notebooks within git without having a change in every commit (because of the notebook outputs and its metadata). It does the heavy lifting for you.
This is a thing for Observable notebooks. But the language is JavaScript specific I think.
TBH I haven't used a notebook in years just it's a commonly requested format for management when they want a presentation of data in a format they can easily understand.
In my opinion, the following tools are great to get started with data visualization, even if you only have basic design/programming skills!
I use it to build interactive web apps that allow users to not only see the data, but input their own information into the app.
Now to use it, you do need to have some basic Python knowledge, but it’s very easy to use, and it’s free!
One of the best tool to create various diagrams (flowcharts ), it’s free, and you don’t even need to login or register to use it.
This is a great tool to take your infographics to the next level, as it allows you to make them interactive.
It’s a freemium tool, but the free plan is pretty generous, and pricing plans start from only 9.90!
I use it when I want to make whiteboard videos, works quite well if you want to make Youtube explainer videos or even ads. Paid plans start at 39$, which is pretty fair in my opinion.
This tool is kinda like Canva on steroids, you can use it to make presentations, social media graphics, infographics… I love to use it to make video infographics, like the ones that the Youtube channel Vox does for example. Compared to the other tools I have listed, it’s slightly more expensive (paid plans start at $49), but totally worth it in my opinion!
PS: I love dataviz so much, i made my own product called Polymer Search. I use it to create interactive web apps and charts (scatter plots, heat maps, etc.) Check it out and let me know what you think!
You forgot the big one https://flourish.studio/
oh yes that's right, thanks!
why not putting them in the post?
I use draw.io for flowcharts for the same reasons as you mention. But it’s good to know a back up!
Thanks for the suggestions bud
How do you deploy streamlit apps? Do you have to use their service?
I deploy python code in GitHub repos as Streamlit apps.
You can deploy them in lots of places. I have used Heroku in the past, and more recently a Dokku instance on my own VPS. Someone with more experience in that sort of thing can comment about other solutions.
Nice list. Have you also tried Flourish or Data Wrapper? What did you think?
best data science and analytics tools
Key Considerations for Data Science and Analytics Tools
Ease of Use: Look for tools with user-friendly interfaces, especially if you're new to data science. Tools that offer drag-and-drop features can simplify the process.
Integration Capabilities: Ensure the tool can easily integrate with other software and data sources you use, such as databases, cloud services, and APIs.
Scalability: Choose tools that can handle large datasets and scale as your data needs grow.
Community and Support: A strong community and good support resources (documentation, forums, tutorials) can be invaluable for troubleshooting and learning.
Cost: Consider your budget. Some tools are open-source and free, while others may require a subscription or one-time purchase.
Recommended Tools:
Python with Libraries (Pandas, NumPy, Scikit-learn): Python is a versatile programming language widely used in data science. Its libraries provide powerful data manipulation, analysis, and machine learning capabilities.
R: R is another popular programming language for statistical analysis and data visualization. It has a rich ecosystem of packages for various data science tasks.
Tableau: A leading data visualization tool that allows you to create interactive and shareable dashboards. It's user-friendly and great for business intelligence.
Power BI: Microsoft's analytics service that provides interactive visualizations and business intelligence capabilities with an easy-to-use interface.
Apache Spark: An open-source distributed computing system that is excellent for big data processing and analytics. It supports multiple programming languages, including Python, R, and Scala.
Jupyter Notebooks: An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. Great for exploratory data analysis.
Google Analytics: Essential for web analytics, it helps track and report website traffic, providing insights into user behavior.
Recommendation: If you're just starting out, I recommend beginning with Python and its libraries due to its versatility and strong community support. For visualization, consider Tableau or Power BI based on your organization's needs. If you're dealing with large datasets, look into Apache Spark for its powerful processing capabilities.
Get more comprehensive results summarized by our most cutting edge AI model. Plus deep Youtube search.