python - pandas join data frames on similar but not identical string using lower case only -
i need join data frames on columns similar not identical. fortunately, lowercase letters identical between columns. trying isolate lowercase letters each column, create new columns join on.
df1 = pd.dataframe({'alpha': ['1', '2', '3'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],}) df2 = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'zeta': ['17', '23','116'],})
this have tried
def joinnames(df): filelist = [] c in df: if c.islower(): filelist.append(c) return filelist df1['joinhere'] = df1['beta'].apply(joinnames) df2['joinhere'] = df2['gamma'].apply(joinnames) pd.merge(df1,df2, how ='left', left_on = 'joinhere', right_on = 'joinhere' )
this output trying achieve.
final = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'], 'zeta': ['17', '23','116'],})
you use series.str.extract
find lowercase letters:
import pandas pd df1 = pd.dataframe({'alpha': ['1', '2', '3'], 'beta': ['jrleparoux', 'bjhernandez,jr.','sxbridgmohan'],}) df2 = pd.dataframe({'alpha': ['1', '2', '3'], 'gamma': ['leparoux r', 'hernandez,b, jr.','bridgmohan s x'], 'zeta': ['17', '23','116'],}) df1['lower'] = df1['beta'].str.extract(r'([a-z]+)') df2['lower'] = df2['gamma'].str.extract(r'([a-z]+)') final = pd.merge(df1, df2) print(final)
yields
alpha beta lower gamma zeta 0 1 jrleparoux eparoux leparoux r 17 1 2 bjhernandez,jr. ernandez hernandez,b, jr. 23 2 3 sxbridgmohan ridgmohan bridgmohan s x 116
note assumes collecting ascii characters a
z
suffices produce values on join. if beta
, gamma
columns contains non-ascii lowercase characters (such characters accent marks) may need add regex character class, [a-z]
.
Comments
Post a Comment