group by - pandas groupby aggregate with grand total in the bottom -
here code:
import stringio pandas import * import numpy np df = read_csv(stringio.stringio('''col1 col2 b d 1 6 e 2 7 b d 3 8 b e 4 9 c d 5 19'''), delimiter='\t') df['buc1'] = cut(df['a'], bins = [0, 2, 6, 8]) aggfunc = {'a': sum, 'b': np.average }
after running:
df.groupby(['buc1']).agg(aggfunc)
i get:
b buc1 (0, 2] 3 6.5 (2, 6] 12 12.0 (6, 8] nan nan
my questions are:
- how rid of bottom (6, 8] bucket, given there no values there?
- how add grand total row @ bottom, in pivot table in excel?
for instance:
buc1 b (0, 2] 3 6.5 (2, 6] 12 12 total 15 9.8
note total row second column average, not sum.
to drop na
records, can use .dropna()
dataframe method.
df['buc1'] = df['buc1'].astype(object) result = df.groupby(['buc1']).agg(aggfunc).dropna() result b buc1 (0, 2] 3 6.5 (2, 6] 12 12.0
as marginal totals, in theory should work:
result_alternative = pd.pivot_table(df,index='buc1', aggfunc=aggfunc, values=['a','b'], margins=true) result_alternative b buc1 (0, 2] 3 6.5 (2, 6] 12 12.0 15 9.8
but throws error related , think bug should fixed may work on machine (although running version 0.17)
that same link has workaround related categorical variable--just cast object first, , above call should good.
edit:
if want stay groupby semantics, can add row total so:
rowtotal = result.sum() rowtotal.name = 'rowtotal' result.append(rowtotal)
Comments
Post a Comment