Table Dialect

Peter Desmet

Table Dialect (previously called CSV dialect) is a simple format to describe the dialect of a tabular data file, including its delimiter, header rows, escape characters, etc.

In this document we use the terms “package” for Data Package, “resource” for Data Resource, “dialect” for Table Dialect, and “schema” for Table Schema.

General implementation

Frictionless supports most dialect properties to read Tabular Data Resources. Dialect manipulation is limited to setting a delimiter. When writing resources, it (mainly) makes uses of default dialect properties, removing the necessity to define them.

Read

read_resource() uses the dialect property of a resource to parse a tabular data file. Only properties that deviate from the default need to be specified. E.g. a tab-delimited file without header rows must have the following dialect:

"dialect": {
  "delimiter": "\t",
  "header": false
}

Manipulate

Frictionless does not support direct manipulation of the dialect. add_resource() allows to set one property (dialect$delimiter) when data are provided as a file, all other properties are assumed to be the default.

Write

write_package() writes a package to disk as a datapackage.json file. This file includes the metadata of all the resources, including the dialect (if defined). write_package() writes resources created from a data frame to CSV files, but no dialect property is set for those, since only defaults are used.

Properties implementation

delimiter

delimiter is used by read_resource() and defaults to ",". It is passed to delim in readr::read_delim(). add_resource() does not set delimiter, unless provided in delim and different from the default ",":

library(frictionless)
package <- example_package()

path <- system.file("extdata", "observations_1.tsv", package = "frictionless")
package <- add_resource(package, "observations", data = path, delim = "\t", replace = TRUE)
package$resources[[2]]$dialect$delimiter
#> [1] "\t"

lineTerminator

lineTerminator is ignored by read_resource(). It relies on readr::read_delim() instead, which interprets line terminator LF and CRLF automatically and does not support CR (used by Classic Mac OS, final release 2001).

quoteChar

quoteChar is used by read_resource() and defaults to ". It is passed to quote in readr::read_delim().

doubleQuote

doubleQuote is used by read_resource() and defaults to true, but can be overruled by escapeChar. It is passed to escape_double in readr::read_delim().

escapeChar

escapeChar is ignored by read_resource() unless it is "\\". It is passed as escape_backslash = TRUE and escape_double = FALSE in readr::read_delim(). Note that escapeChar and doubleQuote are mutually exclusive, so you cannot escape with \" and "" in the same file.

nullSequence

nullSequence is ignored by read_resource(). Provide as missingValues in the schema instead (see vignette("table-schema")).

skipInitialSpace

skipInitialSpace is used by read_resource() and defaults to false. It is passed to trim_ws in readr::read_delim().

commentChar

commentChar is used by read_resource() and defaults to undefined. It is passed to comment in readr::read_delim().

caseSensitiveHeader

caseSensitiveHeader is ignored by read_resource().

csvddfVersion

csvddfVersion is ignored by read_resource().