Recently we run into a situation that needs to find a way to create hive view that works both in Spark and Trino where we need to filter on a partition field which is a date field in string format.
Spark SQL date and timestamp functions
date_format(timestamp, fmt)
to_date(date_str[, fmt])
to_timestamp(timestamp_str[, fmt])
Trino Date and time functions and operators
format_datetime(timestamp, format) → varchar
parse_datetime(string, format) → timestamp with time zone
date_format(timestamp, format) → varchar
date_parse(string, format)
the most annoying fact is both have date_format
, but the representation of the format are different which simply result in silent error.
so what does it mean to be sql compliant?
The ANSI SQL standards specify qualifiers and formats for character representations of DATETIME and INTERVAL values. The standard qualifier for a DATETIME value is YEAR TO SECOND, and the standard format is as follows: YYYY-MM-DD HH:MM:SS
The standards for an INTERVAL value specify the following two classes of intervals: The YEAR TO MONTH class has the format: YYYY-MM A subset of this format is also valid: for example, just a month interval.
The DAY TO FRACTION class has the format: DD HH:MM:SS.F Any subset of contiguous fields is also valid: for example, MINUTE TO FRACTION.
https://www.ibm.com/docs/en/informix-servers/12.10?topic=types-ansi-sql-standards-datetime-interval-values
the standard is actually about the conversion between the types, see Chapter 8 – Temporal values.
so the most portable way to convert between string and TIMESTAMP is actually CAST. for example, below query works in both Spark and Trino.
-- date to string
SELECT cast(CURRENT_DATE AS varchar(10));
-- string to date
SELECT cast('2022-10-17' AS date);