python - pandas join data frames on similar but not identical string using lower case only -
i need join data frames on columns similar not identical. fortunately, lowercase letters identical between columns. trying isolate lowercase letters each column, create new columns join on.
df1 = pd.dataframe({'alpha': ['1', '2', '3'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],}) df2 = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'zeta': ['17', '23','116'],}) this have tried
def joinnames(df): filelist = [] c in df: if c.islower(): filelist.append(c) return filelist df1['joinhere'] = df1['beta'].apply(joinnames) df2['joinhere'] = df2['gamma'].apply(joinnames) pd.merge(df1,df2, how ='left', left_on = 'joinhere', right_on = 'joinhere' ) this output trying achieve.
final = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'], 'zeta': ['17', '23','116'],})
you use series.str.extract find lowercase letters:
import pandas pd df1 = pd.dataframe({'alpha': ['1', '2', '3'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],}) df2 = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'zeta': ['17', '23','116'],}) df1['lower'] = df1['beta'].str.extract(r'([a-z]+)') df2['lower'] = df2['gamma'].str.extract(r'([a-z]+)') final = pd.merge(df1, df2) print(final) yields
alpha beta lower gamma zeta 0 1 jrleparoux eparoux leparoux r 17 1 2 bjhernandez,jr. ernandez hernandez,b, jr. 23 2 3 sxbridgmohan ridgmohan bridgmohan s x 116 note assumes collecting ascii characters a z suffices produce values on join. if beta , gamma columns contains non-ascii lowercase characters (such characters accent marks) may need add regex character class, [a-z].
Comments
Post a Comment