1. Streamlit

https://preview.redd.it/mgbdac9qlaec1.png?width=1615&format=png&auto=webp&s=0182bfb3ab063c71b8aefda7ee3ea9236a593786

ORIGINAL POST

Hey guys, made a directory of the best data science tools to use in categories like ETL, databases/warehouses and data manipulation and more. I’m hoping this can be collaborative so feel free so submit projects you use / your own projects. Happy to hear any feedback.

datasciencestack.co

9 replies

Qkumbazoo · 1 year ago

I get the sense that this list is more about maximising the diversity of tools than the actual practicality and value of it from an organisational perspective. The comments confirms that.

11 upvotes on reddit

alexellman · OP · 1 year ago

how would it be more useful to you? these are all types of tools I use a machine learning engineer

1 upvotes on reddit

Qkumbazoo · 1 year ago

Well first you'll need to define what "best" is, set a list of metrics for which each tool is scored upon, and an overall weighted score.

2 upvotes on reddit

anthrace · 1 year ago

u/alexellman

Mage - ETL

Polars - Data Manipulation

Folium - maps

6 upvotes on reddit

jaskeil_113 · 1 year ago

Gotta have tidymodels and timetk here

6 upvotes on reddit

Comfortable-Dark90 · 1 year ago

That’s awesome, thanks!

2 upvotes on reddit

joarangoe · 1 year ago

3 upvotes on reddit

_Marchetti_ · 1 year ago

I don't think so...

0 upvotes on reddit

_Marchetti_ · 1 year ago

I don't think so ...

0 upvotes on reddit

See 9 replies

r/bigdata • [2]

Summarize

10 Not-to-Miss Data Science Tools

Posted by sharmaniti437 · in r/bigdata · 2 months ago

Modern data science tools blend code, cloud, and AI—fueling powerful insights and faster decisions. They're the backbone of predictive models, data pipelines, and business transformation.

Explore what tools are expected of you as a seasoned data science expert in 2025

https://preview.redd.it/e1xa44as6oaf1.jpg?width=1080&format=pjpg&auto=webp&s=68ef192f2f5287b0af0937584dd598230a8217e4

1 upvotes on reddit

1 replies

Helpful

Not helpful

1 replies

pgoetz · 2 months ago

Um ... Microsoft?

2 upvotes on reddit

See 1 replies

r/datascience • [3]

Summarize

What are your go-to tools for data analysis?

Posted by [deleted] · in r/datascience · 2 years ago

06 replies

Helpful

Not helpful

6 replies

iminfornow · 2 years ago

SQL, Python. And Excel to publish the results to outsiders.

1 upvotes on reddit

po-handz · 2 years ago

Pandas profiling reports

1 upvotes on reddit

forbiscuit · 2 years ago

Google search and Reddit search, because I’m absolutely sure we just had a thread about this recently

3 upvotes on reddit

[deleted] · 2 years ago

I actually googled it but couldn’t find anything recent. Maybe over a year ago and not what I was thinking?

1 upvotes on reddit

aDigitalPunk · 2 years ago

2 upvotes on reddit

[deleted] · 2 years ago

Depends on the amount of data and how complex the analysis and if it’s a one time thing or needs an automated output.

Needs any automation - probably Tableau. Try to give it to the BI team if it’s a straightforward ask.

Not very complex, no automation, can open the dataset in Excel - then probably Excel

No automation but big dataset or needs cleaning or feature engineering or lots of exploration - Python, mostly Pandas and Seaborn, in a Jupyter notebook.

1 upvotes on reddit

See 6 replies

r/datascience • [4]

Summarize

What data science tools have the best user experience?

Posted by Ok_Post_149 · in r/datascience · 1 year ago

Data Science community, I've got a question for you:

Which data science tools do you find most user-friendly?

I just went live with a project I've been working on. I feel like the configuration process is easy but would love to compare it with some of your favorite data science tools. The project I'm working on is a simple cluster compute tool. All you do is add a single line of code to your python script and then you're able to run your code on thousands of separate VMs in the cloud. I built this tool so I could stop relying on DevOps for batch inference and hyperparameter tuning. At the moment we are managing the cluster but in the future I plan to allow users to deploy on their own private cloud. If you are interested I can give you 1k GPU hours for testing it :). I honestly wouldn't mind a few people ripping everything that sucks with the user experience.

Anyways, I'd love to learn about everyone's favorite data science tools (specifically python ones). Ideally I can incorporate a config process that everyone is familiar with and zero friction.

Project link: https://www.burla.dev/

21 upvotes on reddit

11 replies

Helpful

Not helpful

https://docs.metaflow.org/introduction/metaflow-resources

11 replies

TotesMessenger · 1 year ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/datascienceproject] What data science tools have the best user experience? (r/DataScience)

^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)

1 upvotes on reddit

mgmillem · 1 year ago

Databricks, snowflake, vscode with the right extensions

13 upvotes on reddit

Ok_Post_149 · OP · 1 year ago

Why do you like these tools? Also, what vscode extensions are you using?

1 upvotes on reddit

mgmillem · 1 year ago

Databricks has an amazing UI that makes it really easy to share compute. Snowflake has the best querying tool I've ever seen and they are constantly making strides to be the king of the big data space. Vscode with the Jupyter and copilot extensions name notebook programming a bit more enjoyable.

7 upvotes on reddit

gyp_casino · 1 year ago

tidyverse, duckdb, quarto. Haven't tried GitHub copilot, but it looks amazing.

19 upvotes on reddit

Ok_Post_149 · OP · 1 year ago

And from a user experience perspective why do you like using them? They are easy to use and deliver value?

1 upvotes on reddit

Mooks79 · 1 year ago

Define easy to use and to who. To someone who’s unfamiliar with programming they’ll be much harder than a GUI based solution. But to someone who knows these they’ll be much easier. Similarly someone who can program will learn these easily. Someone who doesn’t but has the drive to learn, will also find these easier once they get over the initial learning curve. Context matters.

2 upvotes on reddit

Fickle_Scientist101 · 1 year ago

Your project sounds like Metaflow to me, which is what we use. Seems kinda excessive that you guys are building it from the ground up when there are already good tools out there.

2 upvotes on reddit

Ok_Post_149 · OP · 1 year ago

Could you tell me more about the other tool out there that you think address this problem? I want to see if it is aligned with my research.

1 upvotes on reddit

Fickle_Scientist101 · 1 year ago

It is based on kubernetes and it allows you to essentially define a flow, such as a series of methods that metaflow calls steps, and run it directly on a kubernetes clusters with whatever parameters you choose. This means that with a simple python decorator, our team can esentially start a thousand training jobs in parallel.

It even supports defining exact resources for each step in your flow, so you can allocate more memory to data fetching and preprocessing and only attach a GPU when you arrive at the step/method that starts training

1 upvotes on reddit

jackrzu · 1 year ago

i've used coiled before - kinda frustrating to get used to

1 upvotes on reddit

See 11 replies

r/datascience • [5]

Summarize

What data science tools do you use in your work?

Posted by metalvendetta · in r/datascience · 3 years ago

Most data scientists spend 2-4 years as a DS and gets promoted to senior positions. During this early period, what are the common tools other than doing software engineering using python that you use in your work? What are the best tools nowadays which help build stable models in production?

34 upvotes on reddit

12 replies

Helpful

Not helpful

12 replies

xkcdftgy · 3 years ago

Excel, and selectively Pandas.

17 upvotes on reddit

Anupam_pythonlearner · 3 years ago

Seems excel is winner forever 😀

0 upvotes on reddit

Vervain7 · 3 years ago

Excel with a sprinkling of R.

19 upvotes on reddit

Anupam_pythonlearner · 3 years ago

Seems excel is evergreen

0 upvotes on reddit

Geiszel · 3 years ago

Excel.

20 upvotes on reddit

jgjgleason · 3 years ago

Most crucial functions/vba modules?

2 upvotes on reddit

Rockingtits · 3 years ago

AWS Batch and EMR jobs using docker containers

22 upvotes on reddit

Hecksauce · 3 years ago

Sounds more data engineering-heavy to me!

2 upvotes on reddit

Rockingtits · 3 years ago

We do everything end to end in our team

2 upvotes on reddit

RunOrDieTrying · 3 years ago

Distributing work over clusters to run in parallel. It's commonly used with Spark, idk about other frameworks but pretty sure there's a wide support.

6 upvotes on reddit

Rockingtits · 3 years ago

Elastic Map Reduce. Allows you to distribute computing over a cluster of machines. We use it to run Spark mostly.

The ‘job’ refers to us using job queues. We build a job pipeline and then execute it when needed. AWS takes care of getting the compute resources and then shuts them down when the job is finished

14 upvotes on reddit

[deleted] · 3 years ago

For me, the biggest leap in DS was finding good tools to build data models, I love siren.io for this. It helps rapidly develop an understanding of the features and relations between different data structures. Once you have a great model, it’s possible to advance the level of analytics using a platform like cogility.io.

10 upvotes on reddit

See 12 replies

r/dataengineering • [6]

Summarize

What tools do you use/prefer for data analysis?

Posted by Illustrious-Oil-2193 · in r/dataengineering · 2 years ago

I'd like to get an idea of what the popular tools are for data analysis. If you use or like a tool not on the list please provide a comment and why.

For some context, I work at a small engineering firm. We have a mix of technical engineers that would be open to more modern tools like Jupyter, but also have a number of engineers that will take Excel to the grave unless convinced otherwise (mostly because they know it). I'm trying to find a tool that can provide an modern alternative to Excel for number crunching / visualizations and is more inviting than local notebooks but not limiting to the more technical staff.

I put Databricks on here because I know its a popular platform, but wonder if it is a little overkill for the volume of data (gigabytes) we are analyzing at the moment. Also, I feel Databricks/Spark adds more complexity (if my assumption is wrong, please correct)

Also, project collaboration and integration with central resources would be a big plus.

View Poll

8 upvotes on reddit

11 replies

Helpful

Not helpful

11 replies

umlcat · 2 years ago

Libre Office / Open Office Calc ( Spreadsheet)

3 upvotes on reddit

fancyfanch · 2 years ago

Hex all the way! Sql & Python combined in one notebook

2 upvotes on reddit

codeguy830 · 2 years ago

I currently use Alteryx for Data Analysis, but Hex looks really good. I was excited to see that it even has functionality to load CSVs, as much of my data comes to me as Excel spreadsheets. I love the cells, as they are so similar to the tools I am used to working with in Alteryx.

The one thing I would miss is the graphical display of my flow. I had a project that lasted 2+ years with 4 constantly changing data sources and 8 occasionally changing data sources. That visual representation was necessary then. I would probably resort to post-it notes and yarn with hex 🤣

1 upvotes on reddit

fancyfanch · 2 years ago

They do actually have a graph view . It doesn’t really highlight the sources but it shows which cells are dependent on other cells

2 upvotes on reddit

Firm_Bit · 2 years ago

Sql + any sql client would be my vote.

3 upvotes on reddit

Affectionate_Answer9 · 2 years ago

Yep this is the answer for me too, writing a small script and running it via your preferred client I find to be far faster and more efficient than exporting data to analyze elsewhere.

1 upvotes on reddit

what_duck · 2 years ago

R or Python ... SQL to query your data for analysis. Are you all using SQL to analyze your data?!

Depending on the size of your data you may want to run your analysis via Databricks.

10 upvotes on reddit

QueryingQuagga · 2 years ago

A lot of analysis is counting stuff within groups. I always start by using squeal - usually within a spark framework.

2 upvotes on reddit

what_duck · 2 years ago

That makes sense! When you say "spark framework" what do you mean by that?

1 upvotes on reddit

theleveragedsellout · 2 years ago

This is the way.

2 upvotes on reddit

Cathexis256 · 2 years ago

Sql + Excel

2 upvotes on reddit

See 11 replies

r/datasets • [7]

Summarize

Top 50 Big Data Analytics Tools and Software You should know in 2023

Posted by Veerans · in r/datasets · 2 years ago

bigdataanalyticsnews.com

34 upvotes on reddit

6 replies

Helpful

Not helpful

6 replies

suntrust23 · 2 years ago

Hmm no aws tools

1 upvotes on reddit

Critical-Rabbit · 2 years ago

Very Azure-centric. Weird list.

2 upvotes on reddit

asnjohns · 2 years ago

Don't love this list.

2 upvotes on reddit

SeaIndependent2101 · 2 years ago

Just 50? :p

7 upvotes on reddit

1purenoiz · 2 years ago

Just know then. Not be good with them.

1 upvotes on reddit

JeveStones · 2 years ago

I know right? A chunk of these are flawed BI startups.

3 upvotes on reddit

See 6 replies

r/datascience • [8]

Summarize

What are some data science tools that don’t exist that you would pay for?

Posted by KeyVisual · in r/datascience · 2 years ago

Was having a conversation with a colleague about the breadth of libraries available in our work which made us both wonder: What are some tools that are missing from the data science ecosystem?

4 upvotes on reddit

2 replies

Helpful

Not helpful

2 replies

mild_animal · 2 years ago

A report documentor. Goes through the data calls out the routine key points and draws pretty chart so we can focus on the deep dives.

1 upvotes on reddit

Subject-Resort5893 · 2 years ago

Something that’ll take SQL queries and convert it to DAX

1 upvotes on reddit

See 2 replies

r/datascience • [9]

Summarize

As data scientists, what is a tool or software you would really like to exist?

Posted by TheOmerAngi · in r/datascience · 3 years ago

33 upvotes on reddit

12 replies

Helpful

Not helpful

https://www.reddit.com/r/learnmachinelearning/comments/qwiwpg/reimagined_ml_training/?utm_source=share&utm_medium=ios_app&utm_name=iossmf

12 replies

tyqiangz · 3 years ago

A good data annotation software where multiple people can label and annotate data collaboratively

22 upvotes on reddit

Slightly_Sunny · 3 years ago

Have you seen maxqda? It's kind of expensive, but many researchers use it. I think the main purpose was supposed to be for qualitative data, but I've seen it used on data sets,

1 upvotes on reddit

tyqiangz · 3 years ago

Some datasets are sensitive in nature and you wouldn’t want people outside of your company to view them though.

7 upvotes on reddit

NorCalAbDiver · 3 years ago

Google earth engine is amazing! Wish there were more platforms like it for other datasets too

4 upvotes on reddit

HoboSheep2 · 3 years ago

I like it, but I can't use it in my company as I believe it doesn't with gitlab.

2 upvotes on reddit

HoboSheep2 · 3 years ago

I like it, but I can't use it as I believe it is not available for gitlab.

3 upvotes on reddit

Polus43 · 3 years ago

Access to the access management platform that grants access to my access management platform.

16 upvotes on reddit

tekmailer · 3 years ago

We have that!! But have you submitted for access to the access management panel behind the access control with the access administrator? They usually granted it after the access meeting. You just need access to attend first.

5 upvotes on reddit

sphinx00777 · 3 years ago

Such a timely request. Here’s what I recently built. You can connect your notebook to a GitHub repo and the tool will automatically version every modeling iteration and show the code diff in GitHub.

19 upvotes on reddit

idomic · 3 years ago

This is great! I recommend looking on Ploomber(https://ploomber.io ), for a more robust integration between the two. It has native integration with Git and it does some of this work for you, the Jupyter plugin allows you to read as a notebook and it allows to actually track the notebooks within git without having a change in every commit (because of the notebook outputs and its metadata). It does the heavy lifting for you.

1 upvotes on reddit

prosocialbehavior · 3 years ago

This is a thing for Observable notebooks. But the language is JavaScript specific I think.

1 upvotes on reddit

JimSkills · 3 years ago

TBH I haven't used a notebook in years just it's a commonly requested format for management when they want a presentation of data in a format they can easily understand.

2 upvotes on reddit

See 12 replies

r/visualization • [10]

Summarize

I've worked in data visualization for the past 15 years, here are some of my favorite tools!

Posted by maiden_fan · in r/visualization · 3 years ago

In my opinion, the following tools are great to get started with data visualization, even if you only have basic design/programming skills!

1. Streamlit

I use it to build interactive web apps that allow users to not only see the data, but input their own information into the app.

Now to use it, you do need to have some basic Python knowledge, but it’s very easy to use, and it’s free!

2. Diagrams.net

One of the best tool to create various diagrams (flowcharts ), it’s free, and you don’t even need to login or register to use it.

3. Genially

This is a great tool to take your infographics to the next level, as it allows you to make them interactive.

It’s a freemium tool, but the free plan is pretty generous, and pricing plans start from only 9.90!

4.Doodly

I use it when I want to make whiteboard videos, works quite well if you want to make Youtube explainer videos or even ads. Paid plans start at 39$, which is pretty fair in my opinion.

5.Visme

This tool is kinda like Canva on steroids, you can use it to make presentations, social media graphics, infographics… I love to use it to make video infographics, like the ones that the Youtube channel Vox does for example. Compared to the other tools I have listed, it’s slightly more expensive (paid plans start at $49), but totally worth it in my opinion!

PS: I love dataviz so much, i made my own product called Polymer Search. I use it to create interactive web apps and charts (scatter plots, heat maps, etc.) Check it out and let me know what you think!

159 upvotes on reddit

9 replies

Helpful

Not helpful

9 replies

Marianmza · 3 years ago

You forgot the big one https://flourish.studio/

4 upvotes on reddit

maiden_fan · OP · 3 years ago

oh yes that's right, thanks!

2 upvotes on reddit

maiden_fan · OP · 3 years ago

Links:

23 upvotes on reddit

Ooker777 · 3 years ago

why not putting them in the post?

1 upvotes on reddit

Engineer_Zero · 3 years ago

I use draw.io for flowcharts for the same reasons as you mention. But it’s good to know a back up!

Thanks for the suggestions bud

4 upvotes on reddit

EarthGoddessDude · 3 years ago

How do you deploy streamlit apps? Do you have to use their service?

5 upvotes on reddit

and1984 · 3 years ago

I deploy python code in GitHub repos as Streamlit apps.

1 upvotes on reddit

Caswe · 3 years ago

You can deploy them in lots of places. I have used Heroku in the past, and more recently a Dokku instance on my own VPS. Someone with more experience in that sort of thing can comment about other solutions.

3 upvotes on reddit

[deleted] · 3 years ago

Nice list. Have you also tried Flourish or Data Wrapper? What did you think?

2 upvotes on reddit

See 9 replies

Top People to Talk to

These are some of the top contributors related to

datascience

bigdata

dataengineering

1+ others

Qkumbazoo

No bio available

29974

Karma

119

Posts

10000

Comments

Rockingtits

No bio available

27644

Karma

Posts

1317

Comments

pgoetz

No bio available

24397

Karma

Posts

1410

Comments

forbiscuit

No bio available

50697

Karma

136

Posts

3948

Comments

mgmillem

No bio available

19490

Karma

Posts

819

Comments

gyp_casino

No bio available

5275

Karma

Posts

979

Comments

what_duck

No bio available

6453

Karma

171

Posts

1016

Comments

anthrace

No bio available

4476

Karma

Posts

1005

Comments