Functions

Use built-in functions for data manipulation and analysis to operate on the underlying database storing the chain data. These functions are useful for operations like DataChain.filter and DataChain.mutate.

Functions are organized by category and accessed through their respective modules. For example, string functions are accessed via func.string.length(), array functions via func.array.contains(), etc.

Global Function Access

Only a subset of functions are available directly from datachain.func (e.g., func.length). Most functions should be accessed through their specific module namespace (e.g., func.string.length) to avoid naming conflicts.

Function Categories

DataChain provides several categories of functions for different types of operations:

Aggregate Functions - Functions for aggregating data like sum, count, avg, etc.
Array Functions - Functions for working with arrays and lists
Conditional Functions - Functions for conditional logic like ifelse, case, etc.
Numeric Functions - Functions for numeric operations and computations
Path Functions - Functions for working with file paths
Random Functions - Functions for generating random values
String Functions - Functions for string manipulation and processing
Window Functions - Functions for window operations

Usage

from datachain.func import aggregate, array, conditional, numeric, path, random, string, window

# Access functions through their module namespaces
dc.mutate(
    text_length=string.length("text_column"),
    contains_item=array.contains("array_column", "value"),
    file_extension=path.file_ext("file_path")
)

# Some commonly used functions are also available directly
from datachain.func import sum, count, length, ifelse
dc.mutate(total=sum("amount"))