optimization - Pandas Dataframe - faster apply? -
i've got following code:
from dateutil import parser df.local_time = df.local_time.apply(lambda x: parser.parse(x))
it seems taking prohibitively long time. how can make faster?
you should use pd.to_datetime
faster datetime conversion. example, imagine have data:
in [1]: import pandas pd dates = pd.date_range('2015', freq='min', periods=1000) dates = [d.strftime('%d %b %y %h:%m:%s') d in dates] dates[:5] out[1]: ['01 jan 2015 00:00:00', '01 jan 2015 00:01:00', '01 jan 2015 00:02:00', '01 jan 2015 00:03:00', '01 jan 2015 00:04:00']
you can datetime objects way:
in [2]: pd.to_datetime(dates[:5]) out[2]: datetimeindex(['2015-01-01 00:00:00', '2015-01-01 00:01:00', '2015-01-01 00:02:00', '2015-01-01 00:03:00', '2015-01-01 00:04:00'], dtype='datetime64[ns]', freq=none)
but still can slow in cases. really fast on converting dates strings know dates have same format, can specify format
argument (e.g. here, format='%d %b %y %h:%m:%s'
) or more automatically, use infer_datetime_format=true
format inferred once , used on rest of entries. can result in great speedups size of array grows (but works if formats consistent!).
for example, on these 1000 string dates defined above:
from dateutil import parser ser = pd.series(dates) %timeit ser.apply(lambda x: parser.parse(x)) 10 loops, best of 3: 91.1 ms per loop %timeit pd.to_datetime(dates) 10 loops, best of 3: 139 ms per loop %timeit pd.to_datetime(dates, format='%d %b %y %h:%m:%s') 100 loops, best of 3: 5.96 ms per loop %timeit pd.to_datetime(dates, infer_datetime_format=true) 100 loops, best of 3: 6.79 ms per loop
we factor of 20 speedup specifying or inferring datetime format in pd.to_datetime()
.
Comments
Post a Comment