group by - pandas groupby aggregate with grand total in the bottom -

- April 15, 2013

here code:

import stringio pandas import * import numpy np  df = read_csv(stringio.stringio('''col1 col2      b   d   1   6   e   2   7 b   d   3   8 b   e   4   9 c   d   5   19'''), delimiter='\t')   df['buc1'] = cut(df['a'], bins = [0, 2, 6, 8])  aggfunc = {'a': sum,            'b': np.average            }

after running:

df.groupby(['buc1']).agg(aggfunc)

i get:

             b buc1             (0, 2]   3   6.5 (2, 6]  12  12.0 (6, 8] nan   nan

my questions are:

how rid of bottom (6, 8] bucket, given there no values there?
how add grand total row @ bottom, in pivot table in excel?

for instance:

buc1      b (0, 2]  3   6.5 (2, 6]  12  12 total   15  9.8

note total row second column average, not sum.

to drop na records, can use .dropna() dataframe method.

df['buc1'] = df['buc1'].astype(object)  result = df.groupby(['buc1']).agg(aggfunc).dropna() result                b buc1             (0, 2]   3   6.5 (2, 6]  12  12.0

as marginal totals, in theory should work:

result_alternative = pd.pivot_table(df,index='buc1',                                     aggfunc=aggfunc,                                     values=['a','b'],                                     margins=true) result_alternative               b buc1             (0, 2]   3   6.5 (2, 6]  12  12.0     15   9.8

but throws error related , think bug should fixed may work on machine (although running version 0.17)

that same link has workaround related categorical variable--just cast object first, , above call should good.

edit:

if want stay groupby semantics, can add row total so:

rowtotal = result.sum() rowtotal.name = 'rowtotal' result.append(rowtotal)

Search This Blog

WIKI

group by - pandas groupby aggregate with grand total in the bottom -

Comments

Post a Comment

Popular posts from this blog

jquery - ReferenceError: CKEDITOR is not defined -

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -