python - pandas join data frames on similar but not identical string using lower case only -


i need join data frames on columns similar not identical. fortunately, lowercase letters identical between columns. trying isolate lowercase letters each column, create new columns join on.

df1 = pd.dataframe({'alpha': ['1', '2', '3'],                'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],})  df2 = pd.dataframe({'alpha': ['1', '2', '3'],                'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'],                'zeta': ['17', '23','116'],})  

this have tried

def joinnames(df):         filelist = []     c in df:         if c.islower():              filelist.append(c)     return filelist  df1['joinhere'] = df1['beta'].apply(joinnames)  df2['joinhere'] = df2['gamma'].apply(joinnames) pd.merge(df1,df2, how ='left', left_on = 'joinhere', right_on = 'joinhere' ) 

this output trying achieve.

final = pd.dataframe({'alpha': ['1', '2', '3'],                'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'],                'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],                'zeta': ['17', '23','116'],}) 

you use series.str.extract find lowercase letters:

import pandas pd  df1 = pd.dataframe({'alpha': ['1', '2', '3'],                'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],})  df2 = pd.dataframe({'alpha': ['1', '2', '3'],                'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'],                'zeta': ['17', '23','116'],})    df1['lower'] = df1['beta'].str.extract(r'([a-z]+)') df2['lower'] = df2['gamma'].str.extract(r'([a-z]+)') final = pd.merge(df1, df2) print(final) 

yields

  alpha             beta      lower             gamma zeta 0     1       jrleparoux    eparoux        leparoux r   17 1     2  bjhernandez,jr.   ernandez  hernandez,b, jr.   23 2     3     sxbridgmohan  ridgmohan    bridgmohan s x  116 

note assumes collecting ascii characters a z suffices produce values on join. if beta , gamma columns contains non-ascii lowercase characters (such characters accent marks) may need add regex character class, [a-z].


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -