python - Seaborn: countplot() with frequencies -
i have pandas dataframe column called "axles", can take integer value between 3-12. trying use seaborn's countplot() option achieve following plot:
- left y axis shows frequencies of these values occurring in data. axis extends [0%-100%], tick marks @ every 10%.
- right y axis shows actual counts, values correspond tick marks determined left y axis (marked @ every 10%.)
- x axis shows categories bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
- annotation on top of bars show actual percentage of category.
the following code gives me plot below, actual counts, not find way convert them frequencies. can frequencies using df.axles.value_counts()/len(df.index)
not sure how plug information seaborn's countplot()
.
i found workaround annotations, not sure if best implementation.
any appreciated!
thanks
plt.figure(figsize=(12,8)) ax = sns.countplot(x="axles", data=dfwim, order=[3,4,5,6,7,8,9,10,11,12]) plt.title('distribution of truck configurations') plt.xlabel('number of axles') plt.ylabel('frequency [%]') p in ax.patches: ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
edit:
i got closer need following code, using pandas' bar plot, ditching seaborn. feels i'm using many workarounds, , there has easier way it. issues approach:
- there no
order
keyword in pandas' bar plot function seaborn's countplot() has, cannot plot categories 3-12 did in countplot(). need have them shown if there no data in category. the secondary y-axis messes bars , annotation reason (see white gridlines drawn on text , bars).
plt.figure(figsize=(12,8)) plt.title('distribution of truck configurations') plt.xlabel('number of axles') plt.ylabel('frequency [%]') ax = (dfwim.axles.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0) ax.set_yticks(np.arange(0, 110, 10)) ax2 = ax.twinx() ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100) p in ax.patches: ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))
you can making twinx
axes frequencies. can switch 2 y axes around frequencies stay on left , counts on right, without having recalculate counts axis (here use tick_left()
, tick_right()
move ticks , set_label_position
move axis labels
you can set ticks using matplotlib.ticker
module, ticker.multiplelocator
, ticker.linearlocator
.
as annotations, can x , y locations 4 corners of bar patch.get_bbox().get_points()
. this, along setting horizontal , vertical alignment correctly, means don't need add arbitrary offsets annotation location.
finally, need turn grid off twinned axis, prevent grid lines showing on top of bars (ax2.grid(none)
)
here working script:
import pandas pd import matplotlib.pyplot plt import numpy np import seaborn sns import matplotlib.ticker ticker # random data dfwim = pd.dataframe({'axles': np.random.normal(8, 2, 5000).astype(int)}) ncount = len(dfwim) plt.figure(figsize=(12,8)) ax = sns.countplot(x="axles", data=dfwim, order=[3,4,5,6,7,8,9,10,11,12]) plt.title('distribution of truck configurations') plt.xlabel('number of axles') # make twin axis ax2=ax.twinx() # switch count axis on right, frequency on left ax2.yaxis.tick_left() ax.yaxis.tick_right() # switch labels on ax.yaxis.set_label_position('right') ax2.yaxis.set_label_position('left') ax2.set_ylabel('frequency [%]') p in ax.patches: x=p.get_bbox().get_points()[:,0] y=p.get_bbox().get_points()[1,1] ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), ha='center', va='bottom') # set alignment of text # use linearlocator ensure correct number of ticks ax.yaxis.set_major_locator(ticker.linearlocator(11)) # fix frequency range 0-100 ax2.set_ylim(0,100) ax.set_ylim(0,ncount) # , use multiplelocator ensure tick spacing of 10 ax2.yaxis.set_major_locator(ticker.multiplelocator(10)) # need turn grid on ax2 off, otherwise gridlines end on top of bars ax2.grid(none) plt.savefig('snscounter.pdf')
Comments
Post a Comment