TL;DR: Start with basic plots using plt.plot()
, explore object-oriented interfaces like fig, ax = plt.subplots()
, and consider integrating Seaborn for enhanced functionality.
Basic Plotting with Matplotlib
Matplotlib is a versatile library for creating static, interactive, and animated visualizations in Python. The simplest way to create plots is by using plt.plot()
for line graphs. This function is akin to plotting functions in MATLAB and is bound to the current figure and axis [1:11]. For more complex layouts, use
fig, ax = plt.subplots()
which allows multiple plots within one figure [1:9].
Handling Large Datasets
While Matplotlib can handle large datasets, some users have experienced performance issues with very large data points [1:4]. It's recommended to aggregate or sample your data if you're dealing with millions of points
[1:5]. Alternatively, libraries like Bokeh or Plotly are suggested for handling larger datasets due to their interactivity and scalability
[1:10],
[5:6].
Choosing Between Libraries
Matplotlib is often used alongside Seaborn, which simplifies many statistical plots and offers built-in functionalities that require less code compared to raw Matplotlib [5:4]. Seaborn is particularly useful for statistical data visualization and can be a good starting point for beginners
[3:1],
[5:12]. For interactive dashboards, Plotly or Tableau might be preferable
[3:3],
[4:7].
Customization and Flexibility
Matplotlib provides extensive customization options, allowing pixel-level control over your visualizations [4:6]. However, this flexibility comes at the cost of increased complexity and potential for more code. Users who prefer GUI-based tools may find Tableau easier for quick and interactive visualizations
[4:10].
Additional Resources
For comprehensive learning, several tutorials and resources are recommended:
These resources provide detailed insights into Matplotlib's capabilities and how to effectively use it for various types of data visualization tasks.
Newbie here. I am always confused about using plt.plot(...) Or fig, ax = plt.subplots() and then ax.plot(..) or just using dataframe.plot()/series.plot() or using seaborn. It would be great if someone covered every related function/ method to make plots. Its kinda tough searching for them, since i always end up with a solution solving problem with a different method than what i chose
I Don't have time now to watch these vids, but would surely watch the playlist when my exams get over.
plt.plot is an analog to a function in matlab. Don't have history with matlab? Don't use plt.plot where possible. The plot function creates an "artist" which is a fancy way of saying a line, polygon, collection of dots, or other thing that is a visual element of a figure. When you use plt.plot, it is bound to whatever gcf ("get current figure") and gca ("get current axis") points to. In a short script, this is pretty deterministic (only one figure) but in a more complicated rendering workflow this can cause unexpected behavior.
​
plt.subplots is a way of generating a figure and axis/axes. ax.plot is identical to plt.plot, but the result is bound to ax explicitly.
​
df.plot, etc, are just plotting code that wraps matplotlib built into pandas. These functions usually return the axis they drew on, and also take an axis to draw on as an argument (ax). Same for seaborn.
​
Generally, if you have a statistically-oriented plotting task, you can comfortably get by just with the tools in pandas or seaborn (pick one, seaborn is somewhat preferred) and ignore the existence of matplotlib altogether, except when it comes time to save a figure.
​
If you do more scientific visualizations, or otherwise draw stuff more complicated than what seaborn, etc, can do out of the box, you will end up using matplotlib directly more often.
​
People have mixed views, but when you want to alter axes limits, labels, and/or titles I like to use ax.set, you can do something like
ax = sns.boxplot(...)
ax.set(xlim=(1,2), xlabel='foo',
ylim=(3,4), ylabel='bar',
title='baz')
which is a style I find very clean.
Yepp i also made code like the one you said in the end .
I am currently in the learning phase (got myself a free scholarship) and in the videos they taught a little about the pandas' plot function , and a relatively more about matplotlib and the plt.plot() functions.
However ,am always trying to find a swiss knife kinda thing out of these methods like, "which of them would help me customize the most in my graphs?" And weirdly its always a mix of 2-3 libraries than a single silver bullet.
Like recently i had to plot a time series. When plotted with plt.plot (matplotlib function) or ax.plot(), the time was in terms of daily date strings (around 400) and dates showed like a big black blob. So i had to use pd.to_datetime(..) (pandas function ) to get somewhat better graph. But that didn't looked satisfying, so i removed the pandas function ,created a manual loop, filtered the dates and then used as xticks. But suddenly i came across a stackoverflow answer saying you can just plot using series.plot() without a black blob , and that was it, it really handled all the dates without any conversions and showed a plot in terms of months
You see the problem? Its always like that. I always thought that pyplot is the most advanced and to-go solution for making graphs. But turns out it also isn't the best bet and sometimes it can't provide a good graph without manual tweaks or tweaks from other libraries. Also, if series.plot / dataframe.plot are merely wrappers for matplot, how are they able to make better graphs without any data tweaking?
Hey there.... No worries!! Watch it on your free time.
The difference amount the 3 is in your usage.plt.plot only allows you to plot one figure. However ax.plot allows you to plot multiple plots in one figure. If you just want to plot one figure, you don't need to use ax.plot.
Seaborn is a python library that is built using matplotlib to visualize complicated data. Sometimes if you do that using matplotlib, you will need to write a lot of code. Seaborn simplifies that by providing you with some built-in methods. So it all depends on what you want. I will make some tutorials on seaborn very soon.
Btw, if you go through the tutorials, you will exactly know the difference! So make sure you subscribe so that you can come back whenever you want.
Just a bit of advice: Matplotlib is good... if your data is small. I switched to Bokeh because Matplotlib couldn't handle plots with >1000 data points.
? Of course it can. I've done plots with millions of data points. Obviously with that many points you need to start thinking about what you are trying to achieve and whether aggregation can convey the message in a simple way but matplotlib can technically handle it fine.
Same, I use matplotlib in a tool at work that visualizes multiple lines with thousands of points and matplotlib is in no way a bottleneck.
Interesting. Could have been the type of plot or the data, but I know I'm not the only one who has had problems with matplotlib and large data... thats how i found bokeh.
Subplots video is uploading ! Styles video is in progress !! Axes and figures and object oriented APIs are advanced level. So they would be shown while doing a financial data analysis experiment.
I wouldn't say better, but there are several packages in R that make visualization quite easy. The primary package that comes to mind is ggplot2 (available in python as ggplot although I haven't used it).
R can be a bit counter-intuitive at first, but like any tool it has its strengths. Hadley Wickham has made tremendous improvements via the tidyverse.
Context: I'm learning data science, I use python. For now, only notebooks but I'm thinking about making my own portfolio site in flask at some point. Although that may not happen.
During my journey so far, I've seen authors using matplotlib, seaborn, plotly, holoViews... And now I'm studying a rather academic book where the authors are using ggplot from plotline library (I guess because they are more familiar with R)...
I understand there's no obvious right answer but I still need to decide which one I should invest the most time in to start with. And I have limited information to do that. I've seen rather old discussions about the same topic in this sub but given how fast things are moving, I thought it could be interesting to hear some fresh opinions from you guys.
Thanks!
I have tried matplotlib, plotly/dash, holoviews/hvplot/panel and altair. And I must say: matplotlib is king. Plus, there's mpld3, that supposedly translates matplotlib code to D3.js code (but I haven't tried it).
Here’s what I use.
Polished and intended for others: Plotly or Tableau
Exploratory analysis: seaborn
Quick ad-hoc stuff: matplotlib
Best answer tbh
Second this view!
I like ggplot and learned that one first but matplotlib and seaborn are super common. I know those too. I'd probably start with matplotlib and add on seaborn. I've used plotly for Dash dashboards.
IDK I guess I tend to think that the plotting library isn't as important as understanding how to properly display data. So just pick one.
plotly is fantastic for interactivity, and if you learn a bit about dash (a library for dashboards) you can do some pretty cool things. It can get messy when working at a low level directly with the graph opjects api though.
I typically see matplotlib or seaborn in use in my current company
What are your opinions for using either? A lot of the times I avoid using Matplotlib because I can just display data faster with tableau (theres so much customization). And I end up writing less code as a result. In what use case will it be better to use Matplotlib instead? 3D models?
Once, when I have been interviewed, the recruiter couldn't repeat "mat-plot-lib", she've been saying: "this lib"))
that plotting library
For me they have different use cases. If I am making a web app, for public domain or company use, Tableau/Plotly are my go to tools. For general purpose data visualization/exploration, i prefer matlab/seaborn/pandas plots. Mostly because I think they are faster to produce/iterate and little interactivity is required for such a plot. Edit: Spelling.
What are the limitations of Tableau compared to Matplotlib?
Tableau shines when you need visualizations that are interactive. You can create a dashboard with a bunch of filters and nice looking tooltips in like 5 minutes. There are open source alternatives, but Tableau gives you more out of the box than any of them. Of course Tableau becomes a god damn nightmare as soon as you stray even a little bit from whatever they've implemented for you. It's also a pain to version control.
Personally, I stick to ggplot/matplotlib for most things and use Tableau when I need user interactivity. I know some data scientists I respect a lot who have abandoned ggplot/matplotlib and use Tableau for pretty much everything.
I like what you said in your last part. I am a civil engineer who is learning python pandas and matplotlip to make my data analysis of the ready mix concrete production where I am currently working (not enough budget to hire a proper data analyst, third world.country issues). Every single data is managed with Excel files and I am in search for a more powerfull tool for data analysis and data visualization, and a dashboard for my bosses to present the data. I managed to make a dashboard in excel but I feel its lacking something, and it doesn't feel the proper way neither.
The part of user interactivity is what attracted me the most. Do you recommend me to start using (learning) tableau for what I am aiming for?
Also take a look at Altair. Better looking than Matplotlib (at least without customization). Very similar to how ggolot2 works in R.
I just started using Tableau after sucking with Matplotlib and wish I had tried it sooner. Tableau is great when you want to give someone the ability to explore their own questions on a dashboard with filters.
Ask yourself this: how much flexibility and creative control do you want over your visualisations? If you want to retain full control, choose Matplotlib. Just know that full control also usually means more work involved to get a basic result. But you're in total control on a pixel level. In Tableau you're much more constrained. But Tableau can also be more fun to use because of the GUI.
Laughs in plt.style.use('ggplot')
ggplot can do a lot more than matplotlib right? Faceting, linear fitting , binning and plotting aggregates etc are single line statements in ggplot. I don't know if all that is possible in matplotlib/seaborn with that much ease. Plotting with ggplot is so much easier than anything I've encountered on python. You spend less time fiddling with labels ticks etc..
I use PowerBI not Tableau, but as far as I’ve heard they’re pretty similar. I disagree that it’s just “click and point”. The hard part is figuring out how to structure the data and create a data model that properly joins multiple datasets, then figuring out how to use the visualization language (in my case Dax) to structure the data to create the graphs you want. On some level it can just be dragging components around, but if you want to make anything one level deeper it certainly becomes more complex (and annoying). I have a cs background, so having all of the flexibility to structure the data in a language I’m really familiar with is easier for me. Then I just send in the plot parameters. So I’d argue tools like Tableau aren’t inherently easier than matplotlib, people might think so just because the latter is backed by a real programming language.
My current favorite ones are Seaborn and Plotly. What are your usual go-to when it comes to plotting basic charts and complex ones ?
Thanks
Agree. First I use pandas plot to quickly prototype, then I rewrite using altair. However, for image based plots I still use matplotlib.
Have you ever tried using D3.js or Plot? It is JavaScript but if you are rewrite anyways these libraries can be even more flexible than Altair in my opinion. But I do a lot of front end work so I like using JavaScript.
Yeah Altair (Vega-Lite.js) just extended Wilkinson’s Grammar of Graphics to include Interactivity.
https://idl.cs.washington.edu/files/2017-VegaLite-InfoVis.pdf
I also find this way of programming charts very intuitive.
I dunno. Matplotlib.pyplot is a pain, but it's the only one I know so far and I haven't run into anything I couldn't do with it yet.
Give seaborn a go!
Seaborn is really just a set of convenience functions which make pyplot plots for you. If you're using dataframes and like their style, then it's basically just making pyplot easier to use.
I'm a pyplot fan but often find that seaborn will either do everything I was going to do very easily or at least be a good jumping off point, giving me a plot I can do final tweaks too.
Your comment somehow underestimated the capabilities of Seaborn. I would call Seaborn statisticians' plotting library because almost every Seaborn plot API has statistical functionalities built in. Matplotlib is great for raw plotting. Seaborn offers much more.
For example -
x = [7.5, 7.5, 17.5]
y = [393.198, 351.352, 351.352]
plt.plot(x,y)
vs
sns.lineplot(x=x,y=y)
They both are similar but plot different things because Seaborn has much more functionalities and depending on the parameters will plot differently.
Another great thing about Seaborn is that it plays very nicely with Pandas dataframes.
I'm going to check this out, thanks. I have struggled to "get" matplotlib in a way that makes it easy to integrate. I haven't practiced too much with it since I have data visualization platforms that my org likes to use (and Jupyter notebooks scare people over 45)
Don’t even get me started on matplotlib. For loops just to get all the data on the plot at once? Like I get it but it’s super clunky if you’ve never used it before. That said there are a lot of nice example on the website you can just paste your code into which is what I end up doing most of the time. Seaborn is nice though.
I've never used a for loop with matplotlib. I'm usually working with a gui or an excel workbook though.
It is!
Only drawback imo is that it gets really slow when a lot of datapoints are visualized.
As my research group works mainly with large time series data, we developed an extension that solves this problem.
For large sequences (scatter plots) Plotly Resampler enables to visualize tons of datapoints (through adaptive resampling).
Sorry for the shilling, but plotly with plotly-resampler truly is my daily driver :)
This is the answer, I tried learning matplotlib and seaborn but the end result is just not as nice as something I can whip up in tableau public
I'm also an R trained scientist learning python and God I miss ggplot2. Matplotlib is as capable as ggplot2, but it's not as intuitive.
I happen to work at a genomics lab and was wondering if there are any known examples of applying C4D to representing data, much like the graphs, heat maps, plots etc one would see in biomedical journals but in 3 dimensions and specifically rendered in C4D.
I only ever did very primitive stuff using existing polygons or splines and control them via imported csv values (via structure manager), but that was before Python. Now you should be able to read in any ascii data using python and create points for curves from it.A quick google on the topic found these
http://www.plugincafe.com/forum/forum_posts.asp?TID=6622
https://stackoverflow.com/questions/51914080/how-to-read-massive-csv-files-with-python-inside-cinema4d
How does it know which points are higher or lower? Do you have to give it the color scale?
The RGB information from a photo is first converted into a hue. The hue range is normalized between 0 and 1 and used to displace the vertices of a mesh. The height axis (z) is not to scale with the x&y but I'll had functionality to customize that. I have ideas for a CNN to automatically detect the color values and create a normalized image...that's still in my head though... Otherwise you have to specify which color scale inversion you want (viridis, jet, binary, etc..)
Oh nice. So you could integrate with like, d3-scale-chromatic on the other side of it.
My calculus teacher would die for this
Any chance of sharing the source code? Would love to have a play with that myself.
You can follow the dev progress here: https://github.com/pearsonkyle/Data-VisualizAR
I'll open source the app in 2-3 weeks or so...I need to polish the color-map inversion because it only works for rainbow colormaps at the moment.
This would be such a good teaching tool.
One of the coolest project I've seen. Can you give us a tldr on how you did it and with what framework/library?
The app is built with ARFoundation in Unity3D. ARfoundation is a higher level interface for the hardware specific AR libraries on android and iOS, basically code once and deploy to both platforms. The mesh is procedurally generated using a shader I wrote to displace the vertices based on the color
Something something computers. Got it, thanks!
Seriously, I'm not competent to judge your work, but goddamn the results are impressive!
There's a lot of awesome stereograms here and on /r/MagicEye made for fun, but I've never seen any made for practical purposes. This is a huge mistake! Stereographic images can significantly enhance the interpretability of 3D data by leveraging human binocular vision. Instead of looking at a flat projection on a page, stereograms give us “3D glasses” for 2D data with just our eyes.
I’ve been on a data visualization kick recently, working to overhaul Matplotlib’s 3D plotting capabilities with bug fixes and new features. But the end result of those is still a flattened image – I wanted to actually see my plots in 3D space. So, I made a Matplotlib extension called mpl_stereo to do just that! Along with stereograms, there is also support for making anaglyphs that can be used with regular old red-blue 3D glasses for those who don't know the stereogram viewing technique, as well as wiggle stereograms for people with neither.
Check it out on GitHub! All the info on using the library is in the readme there. Note that the docs there all use parallel view, but when generating your own it's easy to flip to cross view.
Thanks! The usage is as simple as it gets, super intuitive and ergonomic!
Check it out on GitHub! All the info on using the library is in the readme there. Note that the docs there all use parallel view, but when generating your own it's easy to flip to cross view.
TL;DR: use negative ipd
(inter-pupillary distance) to generate cross-view instead of parallel-view.
Thanks! I tried to make it as easy to use as possible so that means a lot to hear!
...but I've never seen any made for practical purposes. This is a huge mistake!
The only I've seen is a program I was using years ago, I don't even remember the name, where you could see the structure of proteins as a stereogram, it was incredible since proteins have a very interesting shape. I agree with you, this should be used more.
awesome! Now, if only I could prompt my peers to do the cross-eye technique..
Absolutely glorious! Thank you!
Fantastic! Genuinely looking for excuses to try this now. Thank you!
Love to see it! There's something incredibly tangible about observing data this way. Stereoscopy is so cool for this; I make some data visualizations too, I've also uploaded stereo footage of the sun. ::)
I was heavily inspired by Magic Eye as well so it's nice to see those books still resonating.
I made the first graph in R. You can find the code on my Github. I made it pretty in Photoshop, the PSD file is also on Github.
Excel, Tableau, R, Power BI..
Python :)
How do you do this in Python
By far the most used is Excel
I like how most of the replies have nothing to do with the question
Didn't realise India was that close to China in population size.
We are in it to win it. 🏆
According to UN reports, India is going to surpass china in 2023
That's a lovely looking futbol
I'm curious how people interact with data in their daily PM lives, how often do you use SQL? What do you use data for: feature prioritization? user behavior and engagement? product health?
I work on a low-volume data B2B software, and there's more of a qualitative data gathering/analysis than quantitative. I'd like to find a new job and anticipate having data-rich product. Curious what it is like?
I've noticed a few Product Manager job descriptions mentioning SQL skills as a requirement. I've never used it routinely as I've always been able to get the data I need from an engineer or data scientist in the team. I've learnt a bit of SQL in the past as I used to design SQL tools for developers. But as a PM I'm now revisiting and doing a basic SQL course on Code Academy.
You’ve hit on a point which probably job descriptions are looking for. I’d use SQL for cases where I wanted an ad hoc answer without burdening others with the work.
My queries were never that advanced, and it’s easier with AI to help, but some basic skills can go far to enable self sufficiency.
I’d then only bother the Data Engineering lead after attempting and showing my attempt at which point they were always very willing to use their superior skills to help.
But yeah, it’s also feasible you can get the answers in your org without SQL, yet some companies will value it. It’s probably a mix of self sufficiency for some companies and others expecting more advanced skills beyond the use cases I’ve mostly used it for.
Do you find that relying on technical experts slows down delivery cycles for your product? Does your technical experts get burdened with everyone going to them and queuing your request, and you always have to follow up with them to be like, hey any updates? and they are like, no not yet? No? Just me? Okay
I can see that being a problem which is why I want to be more self-reliant. In my previous two organisations the engineers and data scientist I worked with were very quick to retrieve the data I asked for and didn't need any chasing at all. One of the team was a database developer and wasn't always part of the sprint cycles as there was quite a lot of maintenance to do, fixing pipelines etc.
I don’t use SQL really ever at all. I typically use Google Sheets to analyze data!
I only use SQL on personal stuff. It's been a long time since I've been given access to a production database.
If the devs don't give us a dump, we riot! (Or make them do the pulls)
SQL is a bread and butter.
For any metric: revenue, performance, etc…
I spend 20% of my bandwidth in pulling & analyzing data. I use SQL & python + AI (Gemini). Having the right data handy always gave me an edge in stakeholder alignment.
As a PM ideally I should get it from my data analyst, but my DAs are often pulled for “urgent” leadership asks. DIY is better than waiting for someone to give me the data.
What kind of product do you work on? And what kind of analysis do you do?
E-commerce product, I mainly do analysis on the transactional data
A bit, I use Claude code to write the SQL for me but know enough to correct it if it’s wrong
how to use matplotlib for data visualization
Key Considerations for Using Matplotlib for Data Visualization
Installation:
pip install matplotlib
Basic Structure:
import matplotlib.pyplot as plt
Creating a Simple Plot:
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
plt.show()
Customizing Plots:
plt.plot(x, y, color='blue', marker='o', linestyle='--')
Multiple Plots:
subplot
:plt.subplot(1, 2, 1) # 1 row, 2 columns, 1st subplot
plt.plot(x, y)
plt.subplot(1, 2, 2) # 1 row, 2 columns, 2nd subplot
plt.bar(x, y)
plt.show()
Saving Figures:
plt.savefig("my_plot.png")
Recommendation: Start with simple plots to familiarize yourself with the syntax and gradually explore more complex visualizations like histograms, scatter plots, and 3D plots. Utilize the extensive Matplotlib documentation and examples available online to enhance your skills. This will help you create effective visualizations tailored to your data analysis needs.
Get more comprehensive results summarized by our most cutting edge AI model. Plus deep Youtube search.