儒略历到格里高利历的日期与OutOfBoundsDatetime

发布于 2021-01-29 16:47:45

我正在使用熊猫数据框,并且我有一个包含朱利安日期的DATE列。我想将该列的每个值转换为公历。

为此,我使用了以下代码:

df[['DATE']] = df[['DATE']].apply(lambda x: pd.to_datetime(x - pd.Timestamp(0).to_julian_date(), unit='D'))

不幸的是,我收到一个看起来像这样的错误:

OutOfBoundsDatetime: ("cannot convert input 381088.5 with the unit 'D'", u'occurred at index MXPLD_DATE')

当我在数据帧中查找导致问题的输入值时,它根本不存在,我也不知道381088.5从哪里来。

你能告诉我我在做什么错吗?

谢谢。

编辑1

我尝试了@jezrael解决方案,但仍然遇到类似的错误。

df['DATE'] = pd.to_datetime(df['DATE'], unit='D', origin='julian')

错误:

---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-18-4353e2be1ced> in <module>()
----> 1 df['DATE'] = pd.to_datetime(df['DATE'], unit='D', origin='julian')

/opt/anaconda2/lib/python2.7/site-packages/pandas/core/tools/datetimes.pyc in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
    469             raise tslib.OutOfBoundsDatetime(
    470                 "{original} is Out of Bounds for "
--> 471                 "origin='julian'".format(original=original))
    472 
    473     elif origin not in ['unix', 'julian']:

OutOfBoundsDatetime: 0          2457184
1          2457155
2          2457155
3          2457155
4          2457155
5          2457155
6          2457155
7          2457155
8          2457155
9          2457155
10         2457155
11         2457155
12         2457155
13         2457155
14         2457155
15         2457155
16         2457155
17         2457155
18         2457155
19         2457155
20         2457155
21         2457155
22         2457155
23         2457155
24         2457155
25         2457155
26         2457155
27         2457155
28         2457701
29         2457701
            ...   
4597928    2457724
4597929    2457724
4597930    2457724
4597931    2457724
4597932    2457724
4597933    2457724
4597934    2457724
4597935    2457724
4597936    2457724
4597937    2457724
4597938    2457724
4597939    2457724
4597940    2457724
4597941    2457724
4597942    2457724
4597943    2457724
4597944    2457724
4597945    2457724
4597946    2457724
4597947    2457724
4597948    2457724
4597949    2457724
4597950    2457724
4597951    2457724
4597952    2457724
4597953    2457724
4597954    2457724
4597955    2457724
4597956    2457724
4597957    2457724
Name: DATE, Length: 4597958, dtype: int64 is Out of Bounds for origin='julian'
关注者
0
被浏览
51
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    我相信你需要to_datetime参数origin

    df = pd.DataFrame({'julian':[2458072.5, 2458073.5]})
    
    df['date'] = pd.to_datetime(df['julian'], unit='D', origin='julian')
    print (df)
          julian       date
    0  2458072.5 2017-11-15
    1  2458073.5 2017-11-16
    

    编辑:

    某个日期时间有问题OutOfBounds

    因此,首先检查时间戳限制

    In [66]: pd.Timestamp.min
    Out[66]: Timestamp('1677-09-21 00:12:43.145225')
    
    In [67]: pd.Timestamp.max
    Out[67]: Timestamp('2262-04-11 23:47:16.854775807')
    

    然后获得最小的朱利安日期时间(通过convertin在线,例如在此处):

    maxdate = 2547338
    mindate = 2333836
    

    然后添加NaN超出范围的日期,例如where

     df = pd.DataFrame({'julian':[2821676, 2547338, 1, 2333836]})
    maxdate = 2547338
    mindate = 2333836
    
    clean_dates = df['julian'].where(df['julian'].between(mindate, maxdate))
    print (clean_dates)
    0          NaN
    1    2547338.0
    2          NaN
    3    2333836.0
    
    df['date'] = pd.to_datetime(clean_dates, unit='D', origin='julian')
    print (df)
        julian                date
    0  2821676                 NaT
    1  2547338 2262-04-10 12:00:00
    2        1                 NaT
    3  2333836 1677-09-21 12:00:00
    

    最后将解决方案应用于您的数据-有2个值转换为NaT

    print (df['MXPLD_DATE'][~df['MXPLD_DATE'].between(mindate, maxdate)])
    
    1217806    2821676
    3167148    2821676
    Name: MXPLD_DATE, dtype: int64
    
    clean_dates = df['MXPLD_DATE'].where(df['MXPLD_DATE'].between(mindate, maxdate))        
    df['MXPLD_DATE'] = pd.to_datetime(clean_dates, unit='D', origin='julian')
    print (df['MXPLD_DATE'])
    0         2015-06-10 12:00:00
    1         2015-05-12 12:00:00
    2         2015-05-12 12:00:00
    3         2015-05-12 12:00:00
    4         2015-05-12 12:00:00
    5         2015-05-12 12:00:00
    6         2015-05-12 12:00:00
    7         2015-05-12 12:00:00
    8         2015-05-12 12:00:00
    9         2015-05-12 12:00:00
    10        2015-05-12 12:00:00
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看