LibreOffice
LibreOffice 24.2 SDK C/C++ API Reference
Enumerations | Functions
uri.h File Reference
#include "sal/config.h"
#include "rtl/textenc.h"
#include "rtl/ustring.h"
#include "sal/saldllapi.h"
#include "sal/types.h"

Go to the source code of this file.

Enumerations

enum  rtl_UriCharClass {
  rtl_UriCharClassNone, rtl_UriCharClassUric, rtl_UriCharClassUricNoSlash, rtl_UriCharClassRelSegment,
  rtl_UriCharClassRegName, rtl_UriCharClassUserinfo, rtl_UriCharClassPchar, rtl_UriCharClassUnoParamValue,
  rtl_UriCharClass_FORCE_EQUAL_SIZE = SAL_MAX_ENUM
}
 Various predefined URI 'char classes. More...
 
enum  rtl_UriEncodeMechanism {
  rtl_UriEncodeIgnoreEscapes, rtl_UriEncodeKeepEscapes, rtl_UriEncodeCheckEscapes, rtl_UriEncodeStrict,
  rtl_UriEncodeStrictKeepEscapes, rtl_UriEncode_FORCE_EQUAL_SIZE = SAL_MAX_ENUM
}
 The mechanism describing how escape sequences in the input of rtl_uriEncode() are handled. More...
 
enum  rtl_UriDecodeMechanism {
  rtl_UriDecodeNone, rtl_UriDecodeToIuri, rtl_UriDecodeWithCharset, rtl_UriDecodeStrict,
  rtl_UriDecode_FORCE_EQUAL_SIZE = SAL_MAX_ENUM
}
 The mechanism describing how rtl_uriDecode() translates (part of) a URI into a Unicode string. More...
 

Functions

SAL_DLLPUBLIC sal_Bool const * rtl_getUriCharClass (rtl_UriCharClass eCharClass) SAL_THROW_EXTERN_C()
 Map a predefined rtl_UriCharClass to a form usable by rtl_uriEncode(). More...
 
SAL_DLLPUBLIC void rtl_uriEncode (rtl_uString *pText, sal_Bool const *pCharClass, rtl_UriEncodeMechanism eMechanism, rtl_TextEncoding eCharset, rtl_uString **pResult) SAL_THROW_EXTERN_C()
 Encode a text as (part of) a URI. More...
 
SAL_DLLPUBLIC void rtl_uriDecode (rtl_uString *pText, rtl_UriDecodeMechanism eMechanism, rtl_TextEncoding eCharset, rtl_uString **pResult) SAL_THROW_EXTERN_C()
 Decode (a part of) a URI. More...
 
SAL_DLLPUBLIC sal_Bool rtl_uriConvertRelToAbs (rtl_uString *pBaseUriRef, rtl_uString *pRelUriRef, rtl_uString **pResult, rtl_uString **pException) SAL_THROW_EXTERN_C()
 Convert a relative URI reference into an absolute URI. More...
 

Enumeration Type Documentation

◆ rtl_UriCharClass

Various predefined URI 'char classes.

'

A 'char class' defines which (ASCII) characters can be written 'as they are' in a part of a Uri, and which characters have to be written using escape sequences ('' followed by two hex digits). Characters outside the ASCII range are always written using escape sequences.

If there are other frequently used char classes, they can be added to this enumeration; the function rtl_getUriCharClass() has to be adapted then, too.

Enumerator
rtl_UriCharClassNone 

The empty char class.

All characters are written using escape sequences.

rtl_UriCharClassUric 

The RFC 2732 <uric> char class.

    The 'valid' characters are !$&'()*+,-./:;=?@[]_~ plus digits and
    letters.

    This differs from RFC 3986 @<fragment> in additionally allowing []
rtl_UriCharClassUricNoSlash 

The RFC 2396 <uric_no_slash> char class.

    The 'valid' characters are !$&'()*+,-.:;=?@_~ plus digits and letters.

    This differs from RFC 3986 @<fragment> in additionally encoding /
    This differs from RFC 3986 @<pchar> in additionally allowing ?
rtl_UriCharClassRelSegment 

The RFC 2396 <rel_segment> char class.

    The 'valid' characters are !$&'()*+,-.;=@_~ plus digits and letters.

    This is the same as RFC 3986 @<segment-nz-nc>
rtl_UriCharClassRegName 

The RFC 2396 <reg_name> char class.

    The 'valid' characters are !$&'()*+,-.:;=@_~ plus digits and letters.

    This differs from RFC 3986 @<reg_name> in additionally allowing @
rtl_UriCharClassUserinfo 

The RFC 2396 <userinfo> char class.

    The 'valid' characters are !$&'()*+,-.:;=_~ plus digits and letters.

    This is the same as RFC 3986 @<userinfo>
rtl_UriCharClassPchar 

The RFC 2396 <pchar> char class.

    The 'valid' characters are !$&'()*+,-.:=@_~ plus digits and letters.

    This differs from RFC 3986 @<pchar> in additionally encoding ;
rtl_UriCharClassUnoParamValue 

The char class for the values of uno URL parameters.

    The 'valid' characters are !$&'()*+-./:?@_~ plus digits and letters.
rtl_UriCharClass_FORCE_EQUAL_SIZE 

◆ rtl_UriDecodeMechanism

The mechanism describing how rtl_uriDecode() translates (part of) a URI into a Unicode string.

Enumerator
rtl_UriDecodeNone 

The text is returned completely unmodified.

rtl_UriDecodeToIuri 

The text is returned in the form of an IURI (cf.

draft-masinter-url-i18n-05.txt).

All escape sequences representing ASCII characters (%00–%7F) are kept, all other escape sequences are interpreted as UTF-8 characters and translated to Unicode, if possible.

rtl_UriDecodeWithCharset 

The text is decoded.

All escape sequences representing characters from the given charset are decoded and translated to Unicode, if possible.

rtl_UriDecodeStrict 

Like rtl_UriDecodeWithCharset, but indicating failure when converting unmappable characters.

Since
UDK 3.2.0
rtl_UriDecode_FORCE_EQUAL_SIZE 

◆ rtl_UriEncodeMechanism

The mechanism describing how escape sequences in the input of rtl_uriEncode() are handled.

Enumerator
rtl_UriEncodeIgnoreEscapes 

The special meaning of '' is ignored (i.e., there are by definition no escape sequences in the input).

This mechanism is useful to encode user input as part of a URI (e.g., the user-supplied password in an ftp URL—'%20abcde' is a valid password, so do not assume that the '%20' is an escaped space).

rtl_UriEncodeKeepEscapes 

All escape sequences ('' followed by two hex digits) are kept intact, even if they represent characters that need not be escaped or if they do not even map to characters in the given charset.

This mechanism is useful when passing on complete URIs more or less unmodified (e.g., within an HTTP proxy): missing escape sequences are added, but existing escape sequences are not touched (except that any lower case hex digits are replaced by upper case hex digits).

rtl_UriEncodeCheckEscapes 

All escape sequences ('' followed by two hex digits) are resolved in a first step; only those that represent characters that need to be escaped are kept intact.

This mechanism is useful to properly encode complete URIs entered by the user: the URI is brought into a 'canonic form,' but care is taken not to damage (valid) escape sequences the (careful) user already entered as such.

rtl_UriEncodeStrict 

Like rtl_UriEncodeIgnoreEscapes, but indicating failure when converting unmappable characters.

Since
UDK 3.2.0
rtl_UriEncodeStrictKeepEscapes 

Like rtl_UriEncodeKeepEscapes, but indicating failure when converting unmappable characters.

Also, any escape sequences that are present are always considered to be (potentially broken) UTF-8. This mechanism is meant to be used on the result of a rtl_UriDecodeToIuri decoding, which will thus only contain escape sequences representing either ASCII characters or broken UTF-8 sequences, and which will all be kept as-is.

Since
UDK 3.2.7
rtl_UriEncode_FORCE_EQUAL_SIZE 

Function Documentation

◆ rtl_getUriCharClass()

SAL_DLLPUBLIC sal_Bool const* rtl_getUriCharClass ( rtl_UriCharClass  eCharClass)

Map a predefined rtl_UriCharClass to a form usable by rtl_uriEncode().

The function rtl_uriEncode() expects an array of 128 booleans, and this function maps rtl_UriCharClass enumeration members to such arrays.

Parameters
eCharClassAny valid member of rtl_UriCharClass.
Returns
An array of 128 booleans, to be used in calls to rtl_uriEncode().

◆ rtl_uriConvertRelToAbs()

SAL_DLLPUBLIC sal_Bool rtl_uriConvertRelToAbs ( rtl_uString *  pBaseUriRef,
rtl_uString *  pRelUriRef,
rtl_uString **  pResult,
rtl_uString **  pException 
)

Convert a relative URI reference into an absolute URI.

This function uses the strict parser algorithm described in RFC 3986, section 5.2.

This function signals exceptions by returning false and letting pException point to a message explaining the exception.

Parameters
pBaseUriRefAn absolute URI that serves as the base URI. If it has to be inspected (i.e., pRelUriRef is not an absolute URI already), and it is not an absolute URI (i.e., does not begin with a <scheme ":"> part), an exception will be signaled.
pRelUriRefA URI reference that may be either absolute or relative. If it is absolute, it will be returned unmodified.
pResultReturns an absolute URI. Must itself not be null, and must point to either null or a valid string. If an exception is signalled, it is left unchanged.
pExceptionReturns an explanatory message in case an exception is signalled. Must itself not be null, and must point to either null or a valid string. If no exception is signalled, it is left unchanged.
Returns
True if no exception is signalled, otherwise false.

◆ rtl_uriDecode()

SAL_DLLPUBLIC void rtl_uriDecode ( rtl_uString *  pText,
rtl_UriDecodeMechanism  eMechanism,
rtl_TextEncoding  eCharset,
rtl_uString **  pResult 
)

Decode (a part of) a URI.

Parameters
pTextAny Unicode string. Must not be null. (If the input is indeed part of a valid URI, this string will only contain a subset of the ASCII characters, but this function also handles other Unicode characters properly.)
eMechanismThe mechanism describing how the input text is translated into a Unicode string.
eCharsetWhen the decode mechanism is rtl_UriDecodeWithCharset, all escape sequences in the input text are interpreted as characters from this charset. Those characters are translated to Unicode characters in the resulting output, if possible.

When the decode mechanism is rtl_UriDecodeNone or rtl_UriDecodeToIuri, this parameter is ignored (and is best specified as RTL_TEXTENCODING_UTF8).

Parameters
pResultReturns a decoded representation of the input text. Must itself not be null, and must point to either null or a valid string.

If the decode mechanism is rtl_UriDecodeStrict, and pText cannot be converted to eCharset because it contains (encodings of) unmappable characters (which implies that pText is not empty), then an empty string is returned.

◆ rtl_uriEncode()

SAL_DLLPUBLIC void rtl_uriEncode ( rtl_uString *  pText,
sal_Bool const *  pCharClass,
rtl_UriEncodeMechanism  eMechanism,
rtl_TextEncoding  eCharset,
rtl_uString **  pResult 
)

Encode a text as (part of) a URI.

Parameters
pTextAny Unicode string. Must not be null.
pCharClassA char class, represented as an array of 128 booleans (true means keep the corresponding ASCII character unencoded, false means encode it). Must not be null, and the boolean corresponding to the percent sign (0x25) must be false. (See rtl_getUriCharClass() for a function mapping from rtl_UriCharClass to such arrays.)
eMechanismThe mechanism describing how escape sequences in the input text are handled.
eCharsetWhen Unicode characters from the input text have to be written using escape sequences (because they are either outside the ASCII range or do not belong to the given char class), they are first translated into this charset before being encoded using escape sequences.

Also, if the encode mechanism is rtl_UriEncodeCheckEscapes, all escape sequences already present in the input text are interpreted as characters from this charset.

Parameters
pResultReturns an encoded representation of the input text. Must itself not be null, and must point to either null or a valid string.

If the encode mechanism is rtl_UriEncodeStrict, and pText cannot be converted to eCharset because it contains unmappable characters (which implies that pText is not empty), then an empty string is returned.