python - Efficiently Labeling Variable Values in Pandas -

- September 15, 2015

i have dataframe variables coded integers, i'd replace actual value labels.

for example, have following dataframe:

>>> df=pd.dataframe([[1,3],[2,2],[3,2]], columns=['q1','q2']) >>> df    q1  q2 0   1   3 1   2   2 2   3   2

if, numbers 1,2,3 represented same value in both columns, have dictionary looked this:

labels={1:'yes',2:'no',3:'unsure'}

and recode applymap:

>>> df.applymap(labels.get)        q1      q2 0     yes  unsure 1      no      no 2  unsure      no

however, integers code different label in each column. example, dictionary of value labels may this:

labels2={'q1':{1:'yes',2:'no',3:'unsure'},          'q2':{1:'very', 2:'a little', 3:'not @ all'}}

what efficient way of recoding values in scenario?

i using apply , loop (see below), pretty clunky. there better way?

>>> import pandas pd >>> dfs=[] >>> question in labels2: ...     d=df[question].map(labels2[question].get) ...     dfs.append(d) ...  >>> pd.concat(dfs,1)        q1          q2 0     yes  not @ 1      no    little 2  unsure    little

you can use apply , use column's name attribute key outer dictionary:

>>> df.apply(lambda col: col.map(labels2[col.name]))       q1          q2 0     yes  not @ 1      no    little 2  unsure    little

Search This Blog

WIKI

python - Efficiently Labeling Variable Values in Pandas -

Comments

Post a Comment

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

jquery - ReferenceError: CKEDITOR is not defined -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -