OpenResty XRay™ YSQL User Manual

Documentation of YSQL Language when creating dynamic tracing analyzers

This manual documents the YSQL language from the user’s perspective.

Table of Contents

Convention

We use the word ysql for the Ops language throughout this document for convenience.

Example code with problems will get a question mark, ?, at the beginning of each of its lines. For example:

? select *;

Description

The YSQL language is inspired by the SQL language but is never meant to be 100% compatible with the SQL standard nor some SQL dialect. Nevertheless, YSQL code may look very similar to SQL code.

The YSQL language is provided as part of the OpenResty XRay platform. It is a means to quickly create new dynamic tracing analyzers against any running processes without log data analysis or special data API collaborations from the target applications.

Back to TOC

Program Layout

A YSQL program consists of multiple statements separated by semicolons. Currently only a single select query statement is allowed, but it may change in the future. Below is an example:

select uri, host
from ngx.reqs;

The last statement’s trailing semicolon may be omitted.

Back to TOC

Lexical Constructs

Integers

Integers can be expressed as decimal or hexadecimal literals, as in

-3
519
0xdeadbeef

Back to TOC

Floating-Point Numbers

Floating point numbers can be expressed like below:

3.1415926
2e8
4.2E2

Back to TOC

String Literals

Character string literals are quoted by single-quotes, just like the standard SQL language, as in

'hello, world!'

Additionally we support the following special escape sequences to specify special characters:

\a          bell, ascii code 0x07
\b          backspace, ascii code 0x08
\f          form feed, ascii code 0x0c
\n          new line
\r          carriage return
\t          tab
\v          vertical tab
\\          backslash (\)
\'          single-quote character (')
\"          double-quote character (")
\<newline>  new line character, just like \n

Below is an example using escape sequences:

'hello\nworld\t!'

Back to TOC

Regex Literals

Regex literals are a way to represent regular expressions for pattern matching (as with the like binary operator). A regex literal, say, /[0-9]+/, can be quoted by different delimiters, as in

/[0-9]+/

rx/[0-9]+/

rx{[0-9]+}

rx([0-9]+)

rx[[0-9]+]

rx'[0-9]+'

rx"[0-9]+"

rx![0-9]+!

rx#[0-9]+#

You can choose the best delimiters to minimize the necessary escapes inside the regex pattern itself.

Currently only the common subset of the Perl compatible regex and POSIX regex syntax is allowed with the exception of the back-reference syntax (like \1, \2, and etc). We may lift this restriction in the future.

Submatch captures using parentheses are supported.

Back to TOC

Boolean Literals

The boolean literals in YSQL are just true and false, just like in SQL.

Back to TOC

Null

The special null keyword indicates a null value (similar to the null value in SQL).

YSQL does not usually involve null values and thus this literal is rarely used. It is included just for the sake of completeness.

Back to TOC

Tables

Table names must always be specified in their fully qualified form unless an alias is given by the user (like with the as ALIAS syntax in a from clause).

YSQL currently supports many built-in tables like ngx.reqs and cpu.profile. See “Builtin Tables” section for details.

Back to TOC

Columns

Column names can be indicated by their bare names like uri and host when there is no ambiguity, or by their fully qualified form with the table name, as in

select ngx.reqs.uri
from ngx.reqs

Where the uri column is specified using the fully qualified form, ngx.reqs.uri.

Columns can also be referenced indirectly via a user alias (as with the as ALIAS syntax in the select list).

Back to TOC

Operators and Expressions

YSQL supports a lot of SQL-like operators and expression types with some of its own additions.

The precedence table for all the operators is as follows:

3    unary !
4    - (unary minus), + (unary plus)
5    || (string concatenation)
7    *, /, %
8    binary -, +
12    = (comparison), >=, >, <=, <, <>, !=, is, like, in, prefix, suffix, contains
13    between
14    not
15    and
17    or
19    left join, inner join
20    , (cross join)

Usually parentheses (...) can be used to group operators with lower precedence.

The not ... variants like not between and is not are not listed in the operator precedence table.

Back to TOC

Arithmetic Expressions

YSQL supports the following binary arithmetic operators: add (+), subtract (’-’), multiple (*), division (/), and modulo (%).

It also supports the unary arithmetic operators - and +.

Back to TOC

String Concatenation

The SQL-style string concatenation operator, ||, is supported.

Back to TOC

Relational Expressions

YSQL supports most of the common SQL-style relational operators:

  • = (equal)
  • != or <> (not equal)
  • <=
  • <
  • >
  • >=
  • between ... and
  • in (value...)
  • like and not like
  • is or is not

The like operator does support two forms of patterns:

  1. SQL-style patterns with meta-characters like % (matching zero or more arbitrary characters) and _ (matching one arbitrary character). The meta-characters can be escaped by a backslash (/) to mean the % and _ characters themselves.
  2. A regex literal like /.../ and rx{...}.

Additionally YSQL supports the following string matching operators to test if the RHS string is a prefix, substring, or suffix in the LHS string, respectively:

  • prefix
  • contains
  • suffix

Back to TOC

Logical Expressions

The following SQL-style logical operators are supported:

  • binary and
  • binary or
  • unary not

Back to TOC

Subqueries

Subqueries are currently not supported by YSQL though the YSQL compiler can parse them just fine.

Back to TOC

Select Query Statement

This statement contains a single select query with one or more of the following clauses. Note that the select clause is always required.

Back to TOC

Select Clause

The SQL-style select clause is supported. Below are some examples:

select 3 + 2;

select 3 > 2;

select 'hello' || 'world';

Relational expressions, arithmetic expressions, and string concatenation expressions can all appear in the select list. Logical expressions are not allowed here, however.

Each selected value can take an optional alias right after the value expression, just as in standard SQL, as in

select 3 > 2 res;

select 3 > 2 as res;

The user can also use asterisk (*) to mean all the columns in the table or tables appeared in the from clause, as in

select *
from ngx.reqs;

Aggregate functions like sum(), avg(), min(), max(), and count() can also be used in the selected value expressions, as in

select count(*), sum(latency_s)
from ngx.reqs;

Back to TOC

From Clause

The from clause is also similar to the standard SQL language. One can specify a single table, as in

select uri
from ngx.reqs;

or multiple tables joined together, as in

select user_bt, count(user_bt) cnt
from cpu.profile inner join proc
group by user_bt
order by count(user_bt) desc
limit 1000

Currently YSQL only supports left join and inner join. Other join types are not supported (though they may be parsed fine).

Back to TOC

Where Clause

The where clause is the same as in the standard SQL language. One can specify a logical expression as a search condition here. Below is an example:

select count(*)
from ngx.reqs
where uri prefix '/css/';

Back to TOC

Group By Clause

The group by clause is similar to the standard SQL language, as in

select count(*) as cnt, host
from ngx.reqs
group by host
order by cnt desc
limit 10;

Back to TOC

Having Clause

The having clause is similar to the standard SQL language. Unlike the where clause, the search condition in the having clause can use aggregate functions like count() and sum(). Below is an example:

select max(latency_s), host
from ngx.reqs
group by host
having max(latency_s) > 0.5

It is also possible to reference an alias in the select list, as in

select max(latency_s) max, host
from ngx.reqs
group by host
having max > 0.5;

Back to TOC

Order By Clause

The order by clause is similar to the standard SQL language. YSQL currently has some limitations though:

  1. Only one column is allowed in the order by clause.
  2. The order by column can only be a column name or a count() aggregate function.

Below is an example:

select count(*), host
from ngx.reqs
group by host
order by count(*) desc;

It is also possible to reference the selected value expression’s alias (if any) in the order by clause, as in

select count(*) cnt, host
from ngx.reqs
group by host
order by cnt desc;

Back to TOC

Limit Clause

The limit clause is similar to the standard SQL language. Below is an example:

select *
from ngx.reqs
limit 10;

Back to TOC

Offset Clause

The offset clause syntax is similar to the PL/SQL language provided by PostgreSQL. Below is an example:

select *
from ngx.reqs
offset 1
limit 10;

This will skip the first row and only output the 2nd through the 10th rows.

Back to TOC

Builtin Tables

Below is a detailed list of the built-in tables currently supported by the YSQL compiler. We are adding new tables and new columns quickly.

Back to TOC

cpu.profile

This table represent CPU profiling. It can only used as the first table in the from clause. It does not have any columns. It is usually joined with other tables like ngx.reqs and etc.

Back to TOC

ngx.reqs

This table is used to inspect nginx (main) requests. Nginx sub-requests are excluded here.

It has the following columns:

  • args

    The URI query-string of the HTTP request.

  • host

    The host name of the HTTP request.

  • latency_ms

    An integer number indicating the request latency in milliseconds, like 120.

  • latency_s

    A floating-point number indicating the request latency in seconds, like 0.32.

  • rewrite_phase_latency_s

    A floating-point number indicating the rewrite phase latency in seconds, like 0.32.

  • rewrite_phase_latency_ms

    A floating-point number indicating the rewrite phase latency in milliseconds, like 0.32.

  • access_phase_latency_s

    A floating-point number indicating the access phase latency in seconds, like 0.32.

  • access_phase_latency_ms

    A floating-point number indicating the access phase latency in milliseconds, like 0.32.

  • content_phase_latency_s

    A floating-point number indicating the content phase latency in seconds, like 0.32.

  • content_phase_latency_ms

    A floating-point number indicating the content phase latency in milliseconds, like 0.32.

  • req_header_latency_s

    A floating-point number indicating the latency in receiving the HTTP request headers in seconds, like 0.32.

  • req_header_latency_ms

    A floating-point number indicating the latency in receiving the HTTP request headers in milliseconds, like 0.32.

  • req_body_latency_s(todo)

    A floating-point number indicating the latency in receiving the HTTP request body in seconds, like 0.32.

  • req_body_latency_ms(todo)

    A floating-point number indicating the latency in receiving the HTTP request body in milliseconds, like 0.32.

  • keepalive

    A boolean value indicating whether the current request enables HTTP keep-alive for the current TCP connection.

  • method

    The method string of the HTTP request.

  • scheme

    The scheme string like http and https of the current HTTP request.

  • uri

    HTTP request URI (excluding any query-string).

  • raw_uri

    HTTP raw request URI (include query-string).

  • client_ip

    String. IP address (IPv4 or IPv6 or unix socket file path).

  • req_body_size

    Integer. The request body size in bytes.

  • req_size

    Integer. The whole request size (including header and body) in bytes.

  • resp_size

    Integer. The whole response size (including header and body) in bytes (if compression is involved, it is the compressed size).

  • resp_body_size

    Integer. The response body size in bytes.

  • server_port

    Integer. The server port the current request is accessing.

  • status

    Integer. The HTTP response status code (like 502 and 200) or the internal nginx error code if no meaningful HTTP status code is available.

  • is_ssl

    Boolean. Whether the current request uses SSL or TLS.

  • tls_ver

    Number. The TLS protocol version number like 1.2 and 1.3.

  • ssl_ver

    Number. The SSL protocol version number like 2 and 3.

  • resp_is_chunked

    Boolean. Whether the response body uses the chunked encoding.

  • req_is_chunked

    Boolean. Whether the request body uses the chunked encoding.

  • http_ver

    Number. The HTTP protocol version (0.9, 1.0, 1.1, 2.0).

  • is_websocket

    Boolean. Whether the current request has upgraded to the WebSocket protocol.

  • resp_is_compressed

    Boolean. Whether the current request’s response body is compressed (by gzip or other encodings).

  • resp_gzip_level

    Integer. The gzip compression level actually used by the response body` (only meaningful when the current response is actually compressed).

  • mime_type

    String. Should be values like application/javascript. Any trailing parameters like ; charset=utf-8 should be excluded.

  • uses_lua

    Boolean. Whether the current request uses any Lua code handlers in any of the request processing phases (like log_by_lua, rewrite_by_lua, and etc).

  • for_static_file

    Boolean. Whether the current request uses the static file module to serve static files on the current machine’s file system.

  • has_upstream

    Boolean. Whether the current request uses a proxy (i.e., using the nginx upstream module).

  • access_log_buf_size

    Integer. The access log buffer size in bytes for the current request.

  • uses_error_page

    Boolean. Whether the current request serves an error page.

  • uri_changes

    Integer. The number of URI changes done in the request processing phases (corresponding to r->uri_changes).

  • uri_changed

    Boolean. Whether the current request’s URI has ever changed by the request processing phases (corresponding to r->uri_changed).

  • is_pipelined

    Boolean. Whether the current request is a pipelined request (i.e., r->pipeline on the C land).

  • resp_header_only

    Boolean. Whether the current response has a header only (r->header_only).

  • location

    String. The nginx location name last serving the current request.

  • user_agent

    String. The User-Agent string.

  • conn_is_reused

    Boolean. Whether the current request reuses a connection created by an earlier request.

  • req_pool_size

    Integer. The current request’s nginx memory pool size in bytes.

  • conn_pool_size

    Integer. The current connection’s nginx memory pool size in bytes.

Back to TOC

proc

This table is used to inspect a running operating system process.

It has the following columns:

  • user_bt

    Userland backtrace, usually for C/C++/Rust programs.

  • kern_bt

    Kernel-space backtrace.

  • res_mem

    Integer. Resident memory size in bytes.

  • virt_mem

    Integer. Virtual memory size in bytes.

  • shr_mem

    Integer. Shared memory size in bytes.

  • txt_mem

    Integer. Text memory size in bytes.

  • page_size

    Integer. Page size in bytes.

  • data_size

    Integer. Program data size in bytes (data + stack)

  • pid

    Integer. Process ID.

  • pgid

    Integer. Process group ID.

  • ppid

    Integer. Parent process ID.

  • exec_name

    String. Executable name of the target process (mapped to execname() in stap).

  • pexec_name

    String. Executable name of the parent process of the target process (mapped to pexecname() in stap).

  • cpu_id

    Integer. The CPU ID for the CPU where the target process is current running on.

Back to TOC

luajit.vm

This table is used to inspect luajit vm.

It has the following columns:

  • lua_bt

    The Lua function-level backtrace

  • lua_line_bt

    The Lua line-level backtrace

  • vm_state

    Should return integer values for the LuaJIT VM state constants.

  • gc_count

    The size output of lj-gc.y tool, in bytes.

  • is_gc64

    Boolean type, for GC64 mode.

  • jit_maxtrace

    The value for the LuaJIT JIT option maxtrace

  • jit_hotloop

    The value of the LuaJIT JIT option hotloop

  • jit_maxmcode

    The value of the LuaJIT JIT option maxmcode

  • jit_sizemcode

    The value of the LuaJIT JIT option sizemcode

  • jit_maxside

    The value of the LuaJIT JIT option maxsize.

  • jit_maxsnap

    The value of the LuaJIT JIT option maxsnap.

  • jit_minstitch The value of the LuaJIT JIT option minstitch

Back to TOC

vfs.reads

This table is used to inspect vfs read activity.

It has the following columns:

  • data_size

    Integer. The data size read from the VFS in bytes.

  • file_path

    String. The file path from which we are reading data.

  • latency_s

    Number. Latency of the current VFS read operation in seconds. Nanosecond precision. Decimal part allowed.

  • latency_ms

    Number. Latency of the current VFS read operation in seconds. Nanosecond precision. Decimal part allowed.

Back to TOC

vfs.writes

This table is used to inspect vfs write activity.

It has the following columns:

  • data_size

    Integer. The data size written to the VFS in bytes.

  • file_path

    String. The file path to which we are writing data.

  • latency_s

    Number. Latency of the current VFS write operation in seconds. Nanosecond precision. Decimal part allowed.

  • latency_ms

    Number. Latency of the current VFS write operation in seconds. Nanosecond precision. Decimal part allowed.

Back to TOC

Author

Yichun Zhang <yichun@openresty.com>, OpenResty Inc.

Back to TOC

Copyright (C) 2020-2021 by OpenResty Inc. All rights reserved.

This document is proprietary and contains confidential information. Redistribution of this document without written permission from the copyright holders is prohibited at all times.

Back to TOC