Skip to main content

Set Returning Functions

This section describes functions that possibly return more than one row. Set returning functions are usually used in SQL queries in the FROM clause.

generate_series

Generate a series of values, from start to stop with a given step size

`generate_series(<start>, <stop> [, <step>])`

If step is not specified, the default step size of 1 will be used. When step is positive, zero rows are returned if start is greater than stop. Conversely, when step is negative, zero rows are returned if start is less than stop. Zero rows are also returned for NULL inputs. It is an error for step to be zero. Some examples follow:

SELECT * FROM generate_series(2,4);
generate_series
-----------------
2
3
4
(3 rows)

SELECT * FROM generate_series(5,1,-2);
generate_series
-----------------
5
3
1
(3 rows)

SELECT * FROM generate_series(4,3);
generate_series
-----------------
(0 rows)

SELECT generate_series(1.1, 4, 1.3);
generate_series
-----------------
1.1
2.4
3.7
(3 rows)

-- this example relies on the date-plus-integer operator
SELECT current_date + s.a AS dates FROM generate_series(0,14,7) AS s(a);
dates
------------
2004-02-05
2004-02-12
2004-02-19
(3 rows)

unnest

The unnest function expands the elements from an array:

SELECT * FROM unnest(ARRAY['Mon','Tue','Wed','Thu','Fri'])
unnest
-----------------
Mon
Tue
Wed
Thu
Fri
(5 rows)

unnest can also be called with multiple array paramters (potentially of different data types). In this case, the arrays are expanded pairwise. If the arrays are not all the same length then the shorter ones are padded with NULLs.

SELECT * FROM unnest(
ARRAY['Mon','Tue','Wed','Thu','Fri'],
ARRAY[1,2,3,4,5,6,7]
) AS days(name, nr);
name | nr
------+----
Mon | 1
Tue | 2
Wed | 3
Thu | 4
Fri | 5
NULL | 6
NULL | 7
(7 rows)

external

The external function reads data stored in an external file format from one or multiple external locations. It takes the form:

external(<source_location> [, <option> [, ...] ] )

where <option> can be one of:

FORMAT => <format_name>
<format_specific_option> => <value>
<source_location>
Location to read the data from. See External Location documentation for more information.
FORMAT => format_name
Selects the data format to be read. This option can be omitted in case the format can be inferred from the file extension. In case of a list of sources, all of them need to share this extension. Supported formats are depicted in detail in External Formats.
format_specific_option => value
A format-specific option. The available options for each respective format can be found in External Formats.

Some examples:

Read a local CSV file from the working directory of the Hyper server, having two columns and a custom delimiter. A filter is applied to the price column:

SELECT name FROM external(
'./products.csv',
COLUMNS => DESCRIPTOR(
name text NOT NULL,
price int
),
FORMAT => 'csv',
DELIMITER => '|'
) WHERE price > 100;

Same but reading from multiple CSV files:

SELECT name FROM external(
ARRAY['./products1.csv', './products2.csv', './products3.csv'],
COLUMNS => DESCRIPTOR(
name text NOT NULL,
price int
),
FORMAT => 'csv',
DELIMITER => '|'
) WHERE price > 100;

Reading an Apache Parquet file from Amazon S3 using empty credentials and inferring the bucket region. The schema of the file is inferred from the schema information in the file. The file format is inferred from the file extension:

SELECT name FROM external(
s3_location(
's3://mybucket/mydirectory/products.parquet',
access_key_id => '',
secret_access_key => ''
)
) WHERE price > 100;

Same but with explicit Amazon S3 credentials and bucket region:

SELECT name FROM external(
s3_location(
's3://mybucket/mydirectory/products.parquet',
access_key_id => 'ACCESSKEYID12EXAMPLE',
secret_access_key => 'sWfssWSmnME5X/36dsf3G/cbyDzErEXAMPLE123',
region => 'us-east-1'
)
) WHERE price > 100;