개발공부

[Python] Pandas 에서 시간 처리 [생성(datetimeIndex() , to_datetime()), 연산, 인덱싱] 본문

Python/Pandas

[Python] Pandas 에서 시간 처리 [생성(datetimeIndex() , to_datetime()), 연산, 인덱싱]

mscha 2022. 5. 6. 15:21
import pandas as pd

Pandas에서 시간처리를 위한 datetime64를

생성하는 법은 datetimeIndex() , to_datetime() 2가지가 있다.

 

pandas.datetimeIndex()

>>> dates = ['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22']
>>> dates
['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22']

>>> pd.to_datetime(dates)
DatetimeIndex(['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22'], dtype='datetime64[ns]', freq=None)

pandas.to_datetime()

>>> pd.DatetimeIndex(dates)
DatetimeIndex(['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22'], dtype='datetime64[ns]', freq=None)

Pandas Series로 만들기

>>> dates
['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22']
>>> date_index = pd.DatetimeIndex(dates)
>>> date_index
DatetimeIndex(['2022-01-04', '2022-01-07', '2022-01-08', '2022-01-22'], dtype='datetime64[ns]', freq=None)
>>> pd.Series(data = [2000, 35000, 18000, 22000], index=date_index)
2022-01-04     2000
2022-01-07    35000
2022-01-08    18000
2022-01-22    22000
dtype: int64
>>> pd.Series(data = [2000, 35000, 18000, 22000], index=dates1)
2022-01-04     2000
2022-01-07    35000
2022-01-08    18000
2022-01-22    22000
dtype: int64

 

연산

판다스로 만든 datetime64는 넘파이 어레이를 이용해 연산을 할 수 있다.

>>> any_date = np.array('2022-05-11', dtype = np.datetime64)
>>> any_date
array('2022-05-11', dtype='datetime64[D]')

# np.arange를 이용해 timedelta 생성
>>> pd.to_timedelta(np.arange(10), 'D')
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days', '5 days',
                '6 days', '7 days', '8 days', '9 days'],
               dtype='timedelta64[ns]', freq=None)

# 두번째 파라미터를 D로 하면 '일' 에 대한 연산
>>> any_date + pd.to_timedelta(np.arange(10), 'D')            
DatetimeIndex(['2022-05-11', '2022-05-12', '2022-05-13', '2022-05-14',
               '2022-05-15', '2022-05-16', '2022-05-17', '2022-05-18',
               '2022-05-19', '2022-05-20'],
              dtype='datetime64[ns]', freq=None)
# 두번째 파라미터를 W로 하면 '주' 에 대한 연산
>>> any_date + pd.to_timedelta(np.arange(10), 'W')
DatetimeIndex(['2022-05-11', '2022-05-18', '2022-05-25', '2022-06-01',
               '2022-06-08', '2022-06-15', '2022-06-22', '2022-06-29',
               '2022-07-06', '2022-07-13'],
              dtype='datetime64[ns]', freq=None)
              
# 두번째 파라미터를 H로 하면 '시' 에 대한 연산
>>> any_date + pd.to_timedelta(np.arange(10), 'H')
DatetimeIndex(['2022-05-11 00:00:00', '2022-05-11 01:00:00',
               '2022-05-11 02:00:00', '2022-05-11 03:00:00',
               '2022-05-11 04:00:00', '2022-05-11 05:00:00',
               '2022-05-11 06:00:00', '2022-05-11 07:00:00',
               '2022-05-11 08:00:00', '2022-05-11 09:00:00'],
              dtype='datetime64[ns]', freq=None)

 

pandas.date_range()

시작일과 종료일을 셋팅하면, 알아서 날짜를 채우도록 하는 함수

>>> pd.date_range('2022-05-04', '2022-06-21')
DatetimeIndex(['2022-05-04', '2022-05-05', '2022-05-06', '2022-05-07',
               '2022-05-08', '2022-05-09', '2022-05-10', '2022-05-11',
               '2022-05-12', '2022-05-13', '2022-05-14', '2022-05-15',
               '2022-05-16', '2022-05-17', '2022-05-18', '2022-05-19',
               '2022-05-20', '2022-05-21', '2022-05-22', '2022-05-23',
               '2022-05-24', '2022-05-25', '2022-05-26', '2022-05-27',
               '2022-05-28', '2022-05-29', '2022-05-30', '2022-05-31',
               '2022-06-01', '2022-06-02', '2022-06-03', '2022-06-04',
               '2022-06-05', '2022-06-06', '2022-06-07', '2022-06-08',
               '2022-06-09', '2022-06-10', '2022-06-11', '2022-06-12',
               '2022-06-13', '2022-06-14', '2022-06-15', '2022-06-16',
               '2022-06-17', '2022-06-18', '2022-06-19', '2022-06-20',
               '2022-06-21'],
              dtype='datetime64[ns]', freq='D')

# freq='H' -> 범위내 매시간을 출력
>>> pd.date_range('2022-05-04', '2022-06-21', freq='H')
DatetimeIndex(['2022-05-04 00:00:00', '2022-05-04 01:00:00',
               '2022-05-04 02:00:00', '2022-05-04 03:00:00',
               '2022-05-04 04:00:00', '2022-05-04 05:00:00',
               '2022-05-04 06:00:00', '2022-05-04 07:00:00',
               '2022-05-04 08:00:00', '2022-05-04 09:00:00',
               ...
               '2022-06-20 15:00:00', '2022-06-20 16:00:00',
               '2022-06-20 17:00:00', '2022-06-20 18:00:00',
               '2022-06-20 19:00:00', '2022-06-20 20:00:00',
               '2022-06-20 21:00:00', '2022-06-20 22:00:00',
               '2022-06-20 23:00:00', '2022-06-21 00:00:00'],
              dtype='datetime64[ns]', length=1153, freq='H')
              
# freq='H' -> 범위내 매주를 출력              
>>> pd.date_range('2022-05-04', '2022-06-21', freq='W')
DatetimeIndex(['2022-05-08', '2022-05-15', '2022-05-22', '2022-05-29',
               '2022-06-05', '2022-06-12', '2022-06-19'],
              dtype='datetime64[ns]', freq='W-SUN')
              
# freq='H' -> 범위내 매 WED를 출력              
>>> pd.date_range('2022-05-04', '2022-06-21', freq='W-WED')
DatetimeIndex(['2022-05-04', '2022-05-11', '2022-05-18', '2022-05-25',
               '2022-06-01', '2022-06-08', '2022-06-15'],
              dtype='datetime64[ns]', freq='W-WED')
              
# freq='H' -> 범위내 매 비즈니스 데이를 출력              
>>> pd.date_range('2022-05-04', '2022-06-21', freq='B')
DatetimeIndex(['2022-05-04', '2022-05-05', '2022-05-06', '2022-05-09',
               '2022-05-10', '2022-05-11', '2022-05-12', '2022-05-13',
               '2022-05-16', '2022-05-17', '2022-05-18', '2022-05-19',
               '2022-05-20', '2022-05-23', '2022-05-24', '2022-05-25',
               '2022-05-26', '2022-05-27', '2022-05-30', '2022-05-31',
               '2022-06-01', '2022-06-02', '2022-06-03', '2022-06-06',
               '2022-06-07', '2022-06-08', '2022-06-09', '2022-06-10',
               '2022-06-13', '2022-06-14', '2022-06-15', '2022-06-16',
               '2022-06-17', '2022-06-20', '2022-06-21'],
              dtype='datetime64[ns]', freq='B')
              
# freq='H' -> 범위내 2시간 30분 마다의 시간 출력              
>>> pd.timedelta_range(0, periods =10, freq = '2H30T')
TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00',
                '0 days 22:30:00'],
               dtype='timedelta64[ns]', freq='150T')