python - How to append columns based on other column values to pandas dataframe -
i have following problem: want append columns dataframe. these columns unique values in row of dataframe, filled occurence of value in row. looks this:
df: column1 column2 0 1 a,b,c 1 2 a,e 2 3 3 4 c,f 4 5 c,f
what trying is:
column1 column2 b c e f 0 1 a,b,c 1 1 1 1 2 a,e 1 1 2 3 1 3 4 c,f 1 1 4 5 c,f 1 1
(the empty spaces can nan or 0, matters not.)
i have written code aceive this, instead of appending columns, appends rows, output looks this:
column1 column2 0 1 a,b,c 1 2 a,e 2 3 3 4 c,f 4 5 c,f 1 1 b 1 1 c 1 1 e 1 1 f 1 1
the code looks this:
def newcols(x): i, value in df['column2'].iteritems(): listi=value.split(',') value in listi: string = value x[string]=list.count(string) return x df1=df.apply(newcols)
what trying here iterate through each row of dataframe , split string (a,b,c) contained in column2 @ comma, variable listi
list containing separated string values. each of values want make new column , fill number of occurences of value in listi
. confused why code appends rows instead of columns. know why , how can correct that?
while using get_dummies
, can cheat , use pd.value_counts
directly:
>>> df = pd.dataframe({'column1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'column2': {0: 'a,b,c', 1: 'a,e', 2: 'a', 3: 'c,f', 4: 'c,f'}}) >>> df.join(df.column2.str.split(",").apply(pd.value_counts).fillna(0)) column1 column2 b c e f 0 1 a,b,c 1 1 1 0 0 1 2 a,e 1 0 0 1 0 2 3 1 0 0 0 0 3 4 c,f 0 0 1 0 1 4 5 c,f 0 0 1 0 1
step-by-step, have
>>> df.column2.str.split(",") 0 [a, b, c] 1 [a, e] 2 [a] 3 [c, f] 4 [c, f] dtype: object >>> df.column2.str.split(",").apply(pd.value_counts) b c e f 0 1 1 1 nan nan 1 1 nan nan 1 nan 2 1 nan nan nan nan 3 nan nan 1 nan 1 4 nan nan 1 nan 1 >>> df.column2.str.split(",").apply(pd.value_counts).fillna(0) b c e f 0 1 1 1 0 0 1 1 0 0 1 0 2 1 0 0 0 0 3 0 0 1 0 1 4 0 0 1 0 1 >>> df.join(df.column2.str.split(",").apply(pd.value_counts).fillna(0)) column1 column2 b c e f 0 1 a,b,c 1 1 1 0 0 1 2 a,e 1 0 0 1 0 2 3 1 0 0 0 0 3 4 c,f 0 0 1 0 1 4 5 c,f 0 0 1 0 1
Comments
Post a Comment