python - How to append columns based on other column values to pandas dataframe -


i have following problem: want append columns dataframe. these columns unique values in row of dataframe, filled occurence of value in row. looks this:

df:     column1  column2 0     1       a,b,c 1     2       a,e 2     3       3     4       c,f 4     5       c,f 

what trying is:

    column1  column2   b  c  e  f 0     1       a,b,c   1  1  1 1     2       a,e     1        1 2     3             1 3     4       c,f           1     1 4     5       c,f           1     1 

(the empty spaces can nan or 0, matters not.)

i have written code aceive this, instead of appending columns, appends rows, output looks this:

        column1  column2     0     1       a,b,c     1     2       a,e     2     3           3     4       c,f     4     5       c,f         1        1     b     1        1     c     1        1     e     1        1     f     1        1 

the code looks this:

def newcols(x):     i, value in df['column2'].iteritems():         listi=value.split(',')         value in listi:             string = value             x[string]=list.count(string)     return x  df1=df.apply(newcols) 

what trying here iterate through each row of dataframe , split string (a,b,c) contained in column2 @ comma, variable listi list containing separated string values. each of values want make new column , fill number of occurences of value in listi. confused why code appends rows instead of columns. know why , how can correct that?

while using get_dummies, can cheat , use pd.value_counts directly:

>>> df = pd.dataframe({'column1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'column2': {0: 'a,b,c', 1: 'a,e', 2: 'a', 3: 'c,f', 4: 'c,f'}}) >>> df.join(df.column2.str.split(",").apply(pd.value_counts).fillna(0))    column1 column2   b  c  e  f 0        1   a,b,c  1  1  1  0  0 1        2     a,e  1  0  0  1  0 2        3        1  0  0  0  0 3        4     c,f  0  0  1  0  1 4        5     c,f  0  0  1  0  1 

step-by-step, have

>>> df.column2.str.split(",") 0    [a, b, c] 1       [a, e] 2          [a] 3       [c, f] 4       [c, f] dtype: object >>> df.column2.str.split(",").apply(pd.value_counts)       b   c   e   f 0   1   1   1 nan nan 1   1 nan nan   1 nan 2   1 nan nan nan nan 3 nan nan   1 nan   1 4 nan nan   1 nan   1 >>> df.column2.str.split(",").apply(pd.value_counts).fillna(0)     b  c  e  f 0  1  1  1  0  0 1  1  0  0  1  0 2  1  0  0  0  0 3  0  0  1  0  1 4  0  0  1  0  1 >>> df.join(df.column2.str.split(",").apply(pd.value_counts).fillna(0))    column1 column2   b  c  e  f 0        1   a,b,c  1  1  1  0  0 1        2     a,e  1  0  0  1  0 2        3        1  0  0  0  0 3        4     c,f  0  0  1  0  1 4        5     c,f  0  0  1  0  1 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -