GTN

Documentation for GTN, a library for automatic differentiation with weighted finite-state tranducers (WFSTs).

Building and Installing

Build Requirements

  • A C++ compiler with good C++14 support (e.g. g++ >= 5.0)

  • cmake – version 3.5.1 or later, and make

Building and Installing with CMake

Building

Currently, GTN must be built and installed from source.

First, clone gtn from its repository on Github:

git clone https://github.com/facebookresearch/gtn.git && cd gtn

Create a build directory and run CMake and make:

mkdir -p build && cd build
cmake ..
make -j $(nproc)

Run tests with:

make test

Install with:

make install

Build Options

Options

Configurations

Default Value

GTN_BUILD_TESTS

ON, OFF

ON

GTN_BUILD_BENCHMARKS

ON, OFF

ON

GTN_BUILD_EXAMPLES

ON, OFF

ON

GTN_BUILD_PYTHON_BINDINGS

ON, OFF

OFF

CMAKE_BUILD_TYPE

See CMake Docs

Debug

Linking your Project with CMake

Once flashlight is built and installed, including it in another project is simple with a CMake imported target. In your CMake list, add:

find_package(gtn REQUIRED)

# Create myCompiledTarget, etc.
# ...

target_link_libraries(myCompiledTarget PUBLIC gtn::gtn)

Your target’s files will be linked with the library and can include headers (e.g. #include <gtn/gtn.h>) directly.

Python Bindings

Installing Python Bindings

Basic Usage

Construction

Start by constructing a gtn::Graph and adding nodes and arcs.

Graph graph;

// Add a start node
graph.addNode(true);

// Add an accept node
graph.addNode(false, true);

// Add an internal node
graph.addNode();

// Add an arc from node 0 to 2 with label 0
graph.addArc(0, 2, 0);

// Add an arc from node 0 to 2 with input label 1 and output label 1
graph.addArc(0, 2, 1, 1);

// Add an arc from node 2 to 1 with input label 0, output label 0 and weight 2
graph.addArc(2, 1, 0, 0, 2);

We can visualize the graph with draw().

simple fsa

By default a gtn::Graph requires a gradient. To disable gradient computation use the constructer parameter (e.g. Graph(false)) or the method Graph::setCalcGrad().

Nodes and arcs have unique integer ids in the graph. When you add a node or an arc you can capture the id as the return value. This can make adding arcs from node ids simpler in some cases.

Graph g;
auto n1 = g.addNode();
auto n2 = g.addNode();
auto a1 = g.addArc(n1, n2, 0, 1);

Input and Output

The GTN framework provides some convenience functions to visualize graphs. We can print them to stdout or write them to Graphviz’s .dot format with draw().

// print graph to stdout
std::cout << graph << std::endl;

// Draw the graph in dot format to file
draw(graph, "simple_fsa.dot", {{0, "a"}, {1, "b"}, {2, "c"}});

Graph’s in .dot format can be compiled to different images formats. For example, to make a PDF image use

dot -Tpdf graph.dot -o graph.pdf

We can save graphs in binary with save() or as a text file with saveTxt(). Similarly we can load graphs in binary format with load() or from a text file with loadTxt().

// save in binary format
save("myGraph.bin", graph);

// save in text format
saveTxt("myGraph.txt", graph);

// load in binary format
graph = load("myGraph.bin");

// load in text format
graph = loadTxt("myGraph.txt");

We can save and load graphs from input streams as well as files. The text format of the graph is described below.

std::stringstream in(
    // First line is space separated  start states
    "0\n"
    // Second line is space separated accept states
    "1\n"
    // The remaining lines are a list of arcs:
    // <source node> <dest node> <ilabel> [olabel] [weight]
    // where the olabel defaults to the ilabel if it is not specified
    // and the weight defaults to 0.0 if it is not specified.
    "0 2 0\n" // olabel = 0, weight = 0.0
    "0 2 1 1\n" // olabel = 1, weight = 0.0
    "2 1 0 0 2\n"); // olabel = 0, weight = 2.0

Graph other_graph = load(in);

Comparisons

There are several ways to compare two graphs in GTN. Use equal() to check for an exact match between two graphs. In this case nodes with the same id should have the same arcs. The function isomorphic() checks that the graph structures are the same regardless of the actual node ids. This can be expensive for large graphs.

WFSAs and WFSTs of different structure can be equivalent in the sense that they accept or transduce the same paths with the same scores. To check this condition, use randEquivalent(). The function randEquivalent() uses a Monte Carlo sampling strategy to assess the equivalence between two graphs and may incorrectly assume the graphs are equivalent if not enough samples are used.

Example Graphs

Graphs in GTN can have multiple start and accept nodes. Start nodes have bold circles and accepting nodes are denoted by concentric circles in the figure below.

multi start and accept

Graphs can also have cycles, though not every function supports cyclic graphs.

cyclic fsa

GTN also allows \(\epsilon\) transitions in graphs. Use epsilon to add \(\epsilon\) labels for arcs.

epsilon fsa

Common Functions

GTN supports a range of functions on WFSAs and WFSTs from the simple rational functions like concatenation to the powerful composition operation. Unless explicitly noted, every function in GTN is differentiable with respect to its inputs.

Union

Use union_() to compute the union of a std::vector of graphs.

The union of two graphs \(\mathcal{A}_1 + \mathcal{A}_2\) accepts any sequence accepted by either input graph.

_images/union_g1.svg

The g1 recognizes \(aba^*\).

_images/union_g2.svg

The g2 recognizes \(ba\).

_images/union_g3.svg

The graph g3 recognizes \(ac\).

_images/union_graph.svg

The union graph, union_({g1, g2, g3}), recognizes any of \(aba^*\), \(ba\), or \(ac\).

Concatenation

Use concat() to compute the concatenation of two graphs or a std::vector of graphs.

The concatenation of two graphs, \(\mathcal{A}_1\mathcal{A}_2\) accepts any path \({\bf p}{\bf r}\) such that \({\bf p}\) is accepted by \(\mathcal{A}_1\) and \({\bf r}\) is accepted by \(\mathcal{A}_2\).

_images/concat_g1.svg

The graph g1 recognizes \(ba\).

_images/concat_g2.svg

The graph g2 recognizes \(ac\).

_images/concat_graph.svg

The concatenated graph, concat(g1, g2), recognizes \(baac\).

Closure

Use closure() to compute the Kleene closure of a graph.

The Kleene closure, \(\mathcal{A}^*\), accepts any sequence accepted by the original graph repeated 0 or more times (0 repeats is the empty sequence, \(\epsilon\).).

_images/closure_input.svg

The graph g recognizes \(aba\).

_images/closure_graph.svg

The closed graph, closure(g), recognizes \(\{aba\}^*\).

Intersection

Use intersect() to compute the intersection of two acceptors.

The intersection, \(\mathcal{A}_1 \circ \mathcal{A}_2\) accepts any path which is accepted by both input graphs. The score for a path in the intersected graph is the sum of the scores of the path from each input graph.

_images/simple_intersect_g1.svg

Graph g1.

_images/simple_intersect_g2.svg

Graph g2.

_images/simple_intersect.svg

The intersected graph, intersect(g1, g2).

Compose

Use compose() to compute the composition of two transducers.

The composition, \(\mathcal{T}_1 \circ \mathcal{T}_2\) transduces \({\bf p} \rightarrow {\bf u}\) if the first input transduces \({\bf p} \rightarrow {\bf r}\) and the second graph transduces \({\bf r} \rightarrow {\bf u}\). As in intersection, the score of the transduction in the composed graph is the sum of the scores of the transduction from each input graph.

_images/transducer_compose_g1.svg

Graph g1.

_images/transducer_compose_g2.svg

Graph g2.

_images/transducer_compose.svg

The composed graph, compose(g1, g2).

Forward Score

Use forwardScore() to compute the forward score of a graph.

The forward algorithm computes the log-sum-exp of the scores of all accepting paths in a graph. The graph must not have any cycles. Use Graph::item() on the output of forwardScore() to access the score.

_images/simple_forward.svg

The forward score is given by forwardScore(g).

In the above example the graph has three paths:

  • \(0 \rightarrow 1 \rightarrow 2 \rightarrow 3\) with a score of 4.6

  • \(0 \rightarrow 2 \rightarrow 3\) with a score of 5.3

  • \(1 \rightarrow 2 \rightarrow 3\) with a score of 3.5

The resulting forward score is \(\log(e^{4.6} + e^{5.3} + e^{3.5}) = 5.81\).

Viterbi Score

Use viterbiScore() to compute the Viterbi score of a graph.

The Viterbi algorithm gives the max of the scores of all accepting paths in a graph. The graph must not have any cycles. Use Graph::item() on the output of viterbiScore() to access the score.

The Viterbi score of the figure in the section Forward Score is 5.3, the highest score over all paths.

Viterbi Path

Use viterbiPath() to compute the Viterbi path of a graph.

The Viterbi algorithm gives the highest scoring path of all accepting paths in a graph. The graph must not have any cycles. The output of viterbiPath() is a chain graph representing the highest scoring path.

_images/simple_viterbi_path.svg

The Viterbi path of the graph in section Forward Score, computed using viterbiPath(g).

Autograd Basics

Whenever we apply a function on gtn::Graph (s), if at least one of its inputs requires a gradient, the output gtn::Graph records its inputs, and a method by which to compute its gradient. The gradients are computed with a GradFunc, a function that takes the gradient with respect to the gtn::Graph and computes the gradient with respect to its inputs.

To recursively compute the gradient of a gtn::Graph with respect to all of its inputs in the computation graph, call gtn::backward() on the gtn::Graph. This iteratively computes the gradients by repeatedly applying the chain rule in a reverse topological ordering of the computation graph.

Here is a simple example

auto g1 = Graph(true);
g1.addNode(true);
g1.addNode(false, true);
g1.addArc(0, 1, 0, 0, 1.0);

auto g2 = Graph(false);
g2.addNode(true);
g2.addNode(false, true);
g2.addArc(0, 1, 0, 0, 2.0);

auto g3 = Graph(true);
g3.addNode(true);
g3.addNode(false, true);
g3.addArc(0, 1, 0, 0, 3.0);

auto g4 = negate(subtract(add(g1, g2), g3));

g4.backward();

auto g1Grad = g1.grad(); // g1Grad.item() = -1

auto g3Grad = g3.grad(); // g3Grad.item() = 1

Warning

Calling g2.grad() will throw an exception since calcGrad is set to false for the graph

Retaining the Computation Graph

The gtn::backward() function takes an optional boolean parameter, retainGraph, which is false by default. When the argument is false, each gtn::Graph is cleared from the computation graph during the backward pass as soon as it is no longer needed. This reduces peak memory usage while computing gradients. Setting retainGraph to true is not recommended unless there is a need to call backward multiple times without redoing the forward computation.

Let’s consider the computation graph that is built with the above example

Computation Graph

Assuming we need to call gtn::backward() only once on this graph, we can see that the intermediate Graph gSub can be deleted as soon as as the gradients of g3 and gSum are computed. This will happen when retainGraph is set to false.

Interfacing with PyTorch

Adding a GTN function or layer to PyTorch is just like adding any custom extension to PyTorch. For the details, take a look at an example which constructs a custom loss function in PyTorch with GTN. We’ll go over the example here at a high-level with attention to the bits specific to GTN.

First declare the class which should inherit from torch.autograd.Function.

class GTNLossFunction(torch.autograd.Function):
    """
    A minimal example of adding a custom loss function built with GTN graphs to
    PyTorch.

    The example is a sequence criterion which computes a loss between a
    frame-level input and a token-level target. The tokens in the target can
    align to one or more frames in the input.
    """

The GTNLossFunction requires static forward and a backward methods. The forward method, with some additional comments, is copied below:

@staticmethod
def forward(ctx, inputs, targets):
    B, T, C = inputs.shape
    losses = [None] * B
    emissions_graphs = [None] * B

    # Move data to the host as GTN operations run on the CPU:
    device = inputs.device
    inputs = inputs.cpu()
    targets = targets.cpu()

    # Compute the loss for the b-th example:
    def forward_single(b):
        emissions = gtn.linear_graph(T, C, inputs.requires_grad)
        # *NB* A reference to the `data` should be held explicitly when
        # using `data_ptr()` otherwise the memory may be claimed before the
        # weights are set. For example, the following is undefined and will
        # likely cause serious issues:
        #   `emissions.set_weights(inputs[b].contiguous().data_ptr())`
        data = inputs[b].contiguous()
        emissions.set_weights(data.data_ptr())

        target = GTNLossFunction.make_target_graph(targets[b])

        # Score the target:
        target_score = gtn.forward_score(gtn.intersect(target, emissions))

        # Normalization term:
        norm = gtn.forward_score(emissions)

        # Compute the loss:
        loss = gtn.subtract(norm, target_score)

        # We need the save the `loss` graph to call `gtn.backward` and we
        # need the `emissions` graph to access the gradients:
        losses[b] = loss
        emissions_graphs[b] = emissions

    # Compute the loss in parallel over the batch:
    gtn.parallel_for(forward_single, range(B))

    # Save some graphs and other data for backward:
    ctx.auxiliary_data = (losses, emissions_graphs, inputs.shape)

    # Put losses back in a torch tensor and move them  back to the device:
    return torch.tensor([l.item() for l in losses]).to(device)

To perform the backward computation, we save losses, a list which holds the individual graphs storing the loss for each example. We also save the emissions_graphs so that we can access the gradients in order to construct the full gradient with respect to the input tensor.

GTN provides parallel_for() to run computations in parallel. We us it here to parallelize the forward computation and backward computations over examples in the batch. Using parallel_for() requires some care. For example notice the lines:

losses = [None] * B
emissions_graphs = [None] * B

These lists must be preconstructed so that threads can insert into the list rather than constructively appending to it, which could cause a race condition during the execution of forward_single.

The backward method is very simple. It just calls backward() on each loss graph and accumulating the gradients into a torch.Tensor.

@staticmethod
def backward(ctx, grad_output):
    losses, emissions_graphs, in_shape = ctx.auxiliary_data
    B, T, C = in_shape
    input_grad = torch.empty((B, T, C))

    # Compute the gradients for each example:
    def backward_single(b):
        gtn.backward(losses[b])
        emissions = emissions_graphs[b]
        grad = emissions.grad().weights_to_numpy()
        input_grad[b] = torch.from_numpy(grad).view(1, T, C)

    # Compute gradients in parallel over the batch:
    gtn.parallel_for(backward_single, range(B))

    return grad_output.unsqueeze(1).unsqueeze(1) * input_grad.to(grad_output.device), None

Graph

class Graph

A Graph class to perform automatic differentiation with weighted finite-state acceptors (WFSAs) and transducers (WFSTs).

Example:

Graph graph;
graph.addNode(true); // Add a start node
graph.addNode(); // Add an internal node
graph.addNode(false, true); // Add an accept node

// Add an arc from node 0 to 1 with ilabel 0, olabel 1 and weight 2.0
graph.addArc(0, 1, 0, 1, 2.0);

// Add an arc from node 1 to 2 with ilabel 1, olabel 2 and weight 1.0
graph.addArc(1, 2, 1, 2, 1.0);

// Compute the Viterbi score of the graph
auto score = viterbiScore(graph);

print(score); // Print the score graph to std out

backward(score); // Compute the gradient
graph.grad(); // Access the gradient
graph.zeroGrad(); // Clear the gradient

All operations are in the log or tropical semirings. The default score for an arc is 0 (e.g. the multiplicative identity) and the additive identity is -infinity. Path scores are accumulated with log-sum-exp or max operations and the score for a path is accumulated with addition.

Graph-level Methods

Graph(bool calcGrad = true)

Construct a Graph.

Parameters

calcGrad – Whether or not to compute gradients with respect to this graph when calling gtn::backward.

int addNode(bool start = false, bool accept = false)

Adds a node to the graph.

Parameters
  • start – Indicates if the node is a starting node.

  • accept – Indicates if the node is an accepting node.

Returns

The id of the node (used for e.g. adding arcs).

size_t addArc(size_t srcNode, size_t dstNode, int label)

Add a arc between two nodes.

This assumes the graph is an acceptor, the input label on the arc is the same as the output label.

Parameters
  • srcNode – The id of the source node.

  • dstNode – The id of the destination node.

  • label – The arc label.

Returns

The id of the added arc.

size_t addArc(size_t srcNode, size_t dstNode, int ilabel, int olabel, float weight = 0.0)

Add a arc between two nodes.

Parameters
  • srcNode – The id of the source node.

  • dstNode – The id of the destination node.

  • ilabel – The arc input label.

  • olabel – The arc output label.

  • weight – The arc weight.

Returns

The id of the added arc.

inline size_t numArcs() const

The number of arcs in the graph.

inline size_t numNodes() const

The number of nodes in the graph.

inline size_t numStart() const

The number of starting nodes in the graph.

inline size_t numAccept() const

The number of accepting nodes in the graph.

float item() const

Get the weight on a single arc graph.

static Graph deepCopy(const Graph &src)

A deep copy of a graph src which is not recorded in the autograd tape.

For a version which is recorded in the autograd tape see gtn::clone.

void arcSort(bool olabel = false)

Sort the arcs entering and exiting a node in increasing order by arc in label or out label if olabel == true.

This function is intended to be used prior to calls to intersect and compose to improve the efficiency of the algorithm.

inline void markArcSorted(bool olabel = false)

Mark a graph’s arcs as sorted.

If olabel == false then the graph will be marked as sorted by arc input labels, otherwise it will be marked as sorted by the arc output labels.

inline bool ilabelSorted() const

Check if the arcs entering and exiting every node are sorted by input label.

inline bool olabelSorted() const

Check if the arcs entering and exiting every node are sorted by output label.

inline float *weights()

Returns an array of weights from a graph.

The array will contain Graph::numArcs() elements.

inline const float *weights() const

A const version of Graph::weights.

void setWeights(const float *weights)

Set the arc weights on a graph.

The weights array must have Graph::numArcs() elements.

void labelsToArray(int *out, bool ilabel = true)

Extract an array of labels from a graph.

The array should have space for Graph::numArcs() elements.

Parameters
  • out[out] A pointer to the buffer to populate with labels.

  • ilabel[in] Retreive ilabels if true, otherwise gets olabels.

std::vector<int> labelsToVector(bool ilabel = true)

Extract a std::vector of labels from the graph.

See Graph::labelsToArray.

Autograd Methods

void addGrad(std::vector<float> &&other)

Add a std::vector of gradients to the gradient graph weights without making a copy of other.

The Graph::addGrad methods are intended for use by the autograd. This overload is used with an rvalue or std::move to avoid an extra copy:

graph.addGrad(std::move(graphGrad));

void addGrad(const std::vector<float> &other)

Add a std::vector of gradients to the gradient graph weights.

The Graph::addGrad methods are intended for use by the autograd.

void addGrad(const Graph &other)

Add a Graph of gradients to the gradient graph.

The Graph::addGrad methods are intended for use by the autograd.

inline bool calcGrad() const

Check if a graph requires a gradient.

inline bool isGradAvailable() const

Check if a graph’s gradient is computed.

Graph &grad()

Get the gradient graph.

const Graph &grad() const

A const version of Graph::grad.

void setCalcGrad(bool calcGrad)

Specify if the gradient for this graph should be computed.

void zeroGrad()

Clear the graph’s gradients.

std::uintptr_t id()

A unique identifier for a graph.

Intended for use by the autograd.

inline GradFunc gradFunc()

Get the gradient function of a graph.

Intended for use by the autograd.

inline void setGradFunc(GradFunc gradFunc)

Set the gradient function of a graph.

Intended for use by the autograd.

inline std::vector<Graph> &inputs() const

Get the vector of inputs used in the autograd computation graph.

Intended for use by the autograd.

void setInputs(std::vector<Graph> inputs)

Sets the vector of inputs used in the autograd computation graph.

Intended for use by the autograd.

inline Graph withoutWeights() const

Clear the weights on a graph if they are no longer needed.

Intended for use by the autograd.

Node Accessors

inline const std::vector<int> &start() const

Get the indices of the start nodes of the graph.

inline const std::vector<int> &accept() const

Get the indices of the accepting nodes of the graph.

inline bool isStart(size_t i) const

Check if the i-th node is a start node.

inline bool isAccept(size_t i) const

Check if the i-th node is an accepting node.

inline void makeAccept(size_t i)

Make the the i-th node an accepting node.

inline size_t numOut(size_t i) const

The number of outgoing arcs from the i-th node.

inline const std::vector<int> &out(size_t i) const

Get the indices of outgoing arcs from the i-th node.

inline int out(size_t i, size_t j) const

Get the index of the j-th outgoing arc from the i-th node.

inline size_t numIn(size_t i) const

The number of incoming arcs to the i-th node.

inline const std::vector<int> &in(size_t i) const

Get the indices of incoming arcs to the i-th node.

inline size_t in(size_t i, size_t j) const

Get the index of the j-th incoming arc to the i-th node.

Arc Accessors

inline int srcNode(size_t i) const

The destination node of the i-th arc.

inline int dstNode(size_t i) const

The source node of the i-th arc.

inline int label(size_t i) const

The label of the i-th arc (use this for acceptors).

inline int ilabel(size_t i) const

The input label of the i-th arc.

inline int olabel(size_t i) const

The output label of the i-th arc.

inline float weight(size_t i) const

The weight of the i-th arc.

inline void setWeight(size_t i, float weight)

Set the weight of the i-th arc.

Autograd

namespace gtn

Functions

void backward(Graph g, const Graph &grad, bool retainGraph = false)

Compute the gradients of any inputs w.r.t g using the chain rule to incorporate grad, a gradient of another function with respect to graph.

If retainGraph = false (default) then the autograd graph will not be saved and un-referenced nodes in the autograd graph may be destroyed.

Parameters
  • g – The graph to compute gradients with respect to.

  • grad – A seed gradient, typically set to be a gradient of another function with respect to graph.

  • retainGraph – Whether or not to save the autograd graph. Setting this to False is more memory efficient as temporary Graphs created during the forward computation may be destroyed.

void backward(Graph g, bool retainGraph = false)

Compute the gradients of any inputs w.r.t g.

If retainGraph = false (default) then the autograd graph will not be saved and un-referenced nodes in the autograd graph may be destroyed.

See the overload gtn::backward().

Functions

enum Projection

Projection type used with gtn::clone.

Values:

enumerator NONE

Keep both input and output labels.

enumerator INPUT

Keep only the input lables.

enumerator OUTPUT

Keep only the output lables.

Graph negate(const Graph &g)

Negate a scalar graph.

Graph add(const Graph &g1, const Graph &g2)

Add two scalar graphs.

Graph subtract(const Graph &g1, const Graph &g2)

Subtract one scalar graph from another.

Graph clone(const Graph &g, Projection projection = Projection::NONE)

Performs a deep clone of a graph with an option to project to either the input or output labels.

The operation is recorded in the autograd tape. For a version which is not recorded on the autograd tape, see Graph::deepCopy.

Graph projectInput(const Graph &g)

Removes the output labels from the graph and records the operation in the autograd tape.

This function makes a copy of the input graph.

Graph projectOutput(const Graph &g)

Removes the input labels from the graph and records the operation in the autograd tape.

This function makes a copy of the input graph.

Graph concat(const Graph &g1, const Graph &g2)

Create the concatenation of two graphs.

This operation is recorded in the autograd tape.

Equivalent to concat({g1, g2}), see gtn::concat(const std::vector<Graph>&).

Graph concat(const std::vector<Graph> &graphs)

Concatenate a vector of graphs.

This operation is recorded in the autograd tape.

If x_i is a sequence accepted (or x_i:y_i is transduced) by graphs[i] then the concatenated graph accepts the sequence x_1x_2...x_n if graphs contains n graphs. The score of the path x_1...x_n is the sum of the scores of the individual x_i in graphs[i]. The concatenated graph is constructuted by connecting every accepting state of graphs[i-1] to every starting state of graphs[i] with an epsilon transition. The starting state of the concatenated graphs are starting states of graphs[0] and the accepting states are accepting states of graphs.back().

Note the concatenation of 0 graphs gtn::concat({}) is the graph which accepts the empty string (epsilon). The concatentation of a single graph is equivalent to a clone.

Graph closure(const Graph &g)

Create the (Kleene) closure of the graph.

Graph union_(const std::vector<Graph> &graphs)

Create the union of a vector of graphs.

Graph remove(const Graph &g, int label = epsilon)

Create the equivalent graph without epsilon transitions.

If label is specified then instead of removing epsilon transitions, arcs with the matching label are removed. The removed arc labels are treated as if they were epsilon transitions. Note this is different than simply pruning the arc.

Graph remove(const Graph &g, int ilabel, int olabel)

Create the equivalent graph without ilabel:olabel transitions.

The removed arc labels are treated as if they were epsilon transitions. Note this is different than simply pruning the arc.

Graph compose(const Graph &g1, const Graph &g2)

Compose two transducers.

This operation is recorded in the autograd tape. If x:y is transduced by g1 and y:z is transduced by g2 then the composition will transduce x:z. The arc scores are added in the composed graph.

Graph intersect(const Graph &g1, const Graph &g2)

Intersect two acceptors.

This operation is recorded in the autograd tape. This function only works on acceptors, calling it on a graph where graph.ilabel(a) != graph.olabel(a) for some a is undefined and may yield incorrect results. The intersected graph accepts any path x which is accepted by both g1 and g2. The arc scores are added in the intersected graph.

The result of gtn::compose will yield an equivalent result, however; this function should be preferred since the implementation may be faster.

Graph forwardScore(const Graph &g)

Compute the forward score of a graph.

Returns the score in a scalar graph which can be accessed with Graph::item(). This is equivalent to the shortest distance from the start nodes to the accept nodes in the log semiring. NB: This assumes the input graph is acyclic.

Graph viterbiScore(const Graph &g)

Compute the Viterbi score of a graph.

Returns the score in a scalar graph which can be accessed with Graph::item(). This is equivalent to the shortest distance from the start nodes to the accepting nodes in the tropical semiring. NB: This assumes the input graph is acyclic.

Graph viterbiPath(const Graph &g)

Compue the Viterbi shortest path of a graph and return it in a single chain graph with the labels and weights of the shortest path.

This is equivalent to the shortest path from the start nodes to the accepting nodes in the tropical semiring. NB: This assumes the input graph is acyclic.

Graph sample(const Graph &g, size_t maxLength = 1000)

Attempt to sample an accepting path from the graph.

If the graph has dead-ends or does accept any paths, then an empty path may be returned. Note that the empty path “{}” is different from the path which is the empty string “{ε}”.

Parameters
  • g – The graph to sample from.

  • maxLength – The maximum length of a sampled path.

Parallel

template<typename FuncType, typename ...Args>
auto parallelMap(FuncType &&function, Args&&... inputs)

Executes a function in parallel.

Parameters
  • function[in] A function pointer to execute in parallel

  • ...inputs[in] variadic arguments of iterable/indexable containers, such as std::vectors, i.e. vector<T1>, vector<T2>,.... Types must match the input types of function exactly, i.e. function must take arguments T1, T2,....

Returns

a vector of type T where T is the type of the output type of function. If the given function returns void, the return type is void.

Creations

Creation functions facilitate making graphs with common predetermined structure.

Graph scalarGraph(float val, bool calcGrad = true)

Creates a scalar graph - a graph with a single arc between two nodes with a given weight value and a epsilon label.

Graph linearGraph(int M, int N, bool calcGrad = true)

Create a linear chain graph with M + 1 nodes and N edges between each node.

The labels of the edges between each node are the integers [0, ..., N - 1].

Utils

Comparisons

bool randEquivalent(const Graph &g1, const Graph &g2, size_t numSamples = 100, double tol = 1e-4, size_t maxLength = 1000)

Compare two graphs by sampling paths and checking they have the same scores in both graphs.

Parameters
  • g1 – A graph to be compared.

  • g2 – A graph to be compared.

  • numSamples – The number of samples to use. The more samples the more likely the result is accurate.

  • tol – The largest allowed absolute difference between the path score from each graph.

  • maxLength – The maximum length of sampled paths.

bool equal(const Graph &g1, const Graph &g2)

Checks if two graphs are exactly equal (not isomorphic).

bool isomorphic(const Graph &g1, const Graph &g2)

Checks if two graphs are isomorphic.

Note this function will be very very slow for large graphs.

Input and Output

using SymbolMap = std::unordered_map<int, std::string>

User defined map of integer to arc label strings for use with e.g.

gtn::draw.

void save(const std::string &fileName, const Graph &g)

Save a graph to a file in binary format.

void save(std::ostream &out, const Graph &g)

Save a graph to an output stream in binary format.

Graph load(const std::string &fileName)

Load a graph from a file in binary format.

Graph load(std::istream &in)

Load a graph from an input stream in binary format.

Graph load(std::istream &&in)

Load a graph from an input stream in binary format.

void saveTxt(const std::string &fileName, const Graph &g)

Save a graph to a file in txt format.

The first line will be a space separated list of the start nodes (Graph::start) and the second line will be a space separated list of accepting nodes (Graph::accept).The following lines contain the arcs in the format:

srcNode dstNode ilabel olabel weight

void saveTxt(std::ostream &out, const Graph &g)

Save a graph to an output stream in txt format.

Graph loadTxt(const std::string &fileName)

Load a graph from a file in txt format.

The expected format is the first two lines contain a list of space separated start and accept nodes respectively. The following lines should contain the arcs in the format:

srcNode dstNode ilabel [olabel=ilabel] [weight=0.0]
where [x=y] indicate optional values for x with a default value of y.

For example:

0
1
0 1 1
is a two node graph with an arc from start node 0 to accept node 1 with input and output label of 1,
0
1
0 1 1 2
is a two node graph with an arc from node 0 to node 1 with input label 1 and output label 2, and
0
1
0 1 1 2 3.0
is a two node graph with an arc from node 0 to node 1 with input label 1, output label 2, and a weight of 3.0.

The returned Graph will contain “maxNodeIdx” nodes where “maxNodeIdx” is the largest index of a node in the file regardless of if there are arcs for every node.

Graph loadTxt(std::istream &in)

Load a graph from an input stream in txt format.

Graph loadTxt(std::istream &&in)

Load a graph from an input stream in txt format.

std::ostream &operator<<(std::ostream &out, const Graph &g)

overload << operator for working with std::cout on Graph

void draw(const Graph &g, const std::string &filename, const SymbolMap &isymbols = SymbolMap(), const SymbolMap &osymbols = SymbolMap())

Write a graph in the Graphviz DOT format to a file.

Arc labels are of the format ilabel/olabel:weight. If the output symbols are not specified then the olabel is omitted and arc labels are of the format ilabel:weight. If the input symbols are not specified then integer ids are used as the label.

Compile to pdf with:

dot -Tpdf graph.dot -o graph.pdf

Parameters
  • g – The graph to draw

  • filename – The name of the file to write to (e.g. graph.dot).

  • isymbols – A map of integer ids to strings used for arc input labels

  • osymbols – A map of integer ids to strings used for arc output labels

void draw(const Graph &g, std::ostream &out, const SymbolMap &isymbols = SymbolMap(), const SymbolMap &osymbols = SymbolMap())

Write a graph in graphviz dot format to an output stream.

See gtn::draw.

Installing Python Bindings

The python package may be installed with pip:

pip install gtn

To build and install from source, first clone the repo:

git clone https://github.com/facebookresearch/gtn.git

Setup your environment:

conda create -n gtn_env
conda activate gtn_env

Install dependencies:

cd bindings/python
conda install setuptools

Use one of the following commands for installation:

python setup.py install

or, to install in editable mode (for dev):

python setup.py develop

Running Python Tests

Python binding tests can be run with make test, or with

python -m unittest discover bindings/python/test

Run a simple example:

python bindings/python/examples/simple_graph.py

gtn

Autograd

backward(g, grad, retain_graph=False)

Compute the gradients of any inputs with respect to graph.

Parameters
  • graph (Graph) – The graph to compute gradients with respect to.

  • grad (Graph) – A seed gradient, typically set to be a gradient of another function with respect to graph.

  • retain_graph (bool) – Whether or not to save the autograd graph. Setting this to False is more memory efficient as temporary graphs created during the forward computation may be destroyed.

backward(g, retain_graph=False)

Same as backward() but with the initial grad set to be ones.

Functions

add(g1, g2)

Add two scalar graphs.

closure(g)

Compute the (Kleene) closure of the graph. This operation is recorded in the autograd tape.

compose(g1, g2)

Compose two transducers. This operation is recorded in the autograd tape. If x:y is transduced by g1 and y:z is transduced by g2 then the composed graph will transduce x:z. The arc scores are added in the composed graph.

Both compose() and intersect() can be much faster when operating on graphs with sorted arcs. See Graph.arc_sort().

concat(graphs)

Concatenate a list of graphs. This operation is recorded in the autograd tape.

If x_i is a sequence accepted (or x_i:y_i is transduced) by graphs[i] then the concatenated graph accepts the sequence x_1x_2...x_n if graphs contains n graphs. The score of the path x_1...x_n is the sum of the scores of the individual x_i in graphs[i]. The concatenated graph is constructed by connecting every accepting state of graphs[i-1] to every starting state of graphs[i] with an \(\epsilon\) transition. The starting state of the concatenated graphs are starting states of graphs[0] and the accepting states are accepting states of graphs[-1].

Note the concatenation of 0 graphs gtn::concat([]) is the graph which accepts the empty string (\(\epsilon\)). The concatenation of a single graph is equivalent to a clone.

concat(g1, g2)

Equivalent to concat([g1, g2]), see concat().

forward_score(g)

Compute the forward score of a graph. Returns the score in a scalar graph which can be accessed with Graph.item(). This operation is recorded in the autograd tape.

The forward score is equivalent to the shortest distance from the start nodes to the accept nodes in the log semiring.

NB: graph must be acyclic.

intersect(g1, g2)

Intersect two acceptors. This operation is recorded in the autograd tape. This function only works on acceptors, calling it on a graph where graph.ilabel(a) != graph.olabel(a) for some a is undefined and may yield incorrect results. The intersected graph accepts any path x which is accepted by both g1 and g2. The arc scores are added in the intersected graph.

The result of compose() will yield an equivalent result, however; this function should be preferred since the implementation may be faster.

Both compose() and intersect() can be much faster when operating on graphs with sorted arcs. See Graph.arc_sort().

negate(g)

Negate a scalar graph.

project_input(other)

Removes the output labels from the graph and records the operation in the autograd tape. This function makes a copy of the input graph.

project_output(other)

Removes the input labels from the graph and records the operation in the autograd tape. This function makes a copy of the input graph.

remove(other, label=gtn.epsilon)

Construct the equivalent graph without \(\epsilon\) transitions. The \(\epsilon\) closure of each node in the graph is computed and the required transitions are added to yield the \(\epsilon\)-free equivalent graph. If label is specified then instead of removing epsilon transitions, arcs with the matching label are removed. The removed arc labels are treated as if they were \(\epsilon\) transitions.

subtract(g1, g2)

Subtract one scalar graph from another.

union(graphs)

Construct the union of a list of graphs.

viterbi_score(g)

Compute the Viterbi score of a graph. Returns the score in a scalar graph which can be accessed with Graph.item(). This operation is recorded in the autograd tape.

This is equivalent to the shortest distance from the start nodes to the accepting nodes in the tropical semiring.

NB: graph must be acyclic.

viterbi_path(g)

Compue the Viterbi shortest path of a graph and return it in a single chain graph with the labels and weights of the shortest path. This operation is recorded in the autograd tape.

The Viterbi shorted path is equivalent to the shortest path from the start nodes to the accepting nodes in the tropical semiring.

NB: graph must be acyclic.

Creations

scalar_graph(weight, calc_grad=True)

Creates a scalar graph - a graph with a single arc between two nodes with a given weight value and an \(\epsilon\) label (epsilon).

linear_graph(M, N, calc_grad=True)

Create a linear chain graph with M + 1 nodes and N edges between each node. The labels of the edges between each node are the integers [0, ..., N - 1].

Comparisons

equal(g1, g2)

Checks if two graphs are exactly equal (not isomorphic).

isomorphic(g1, g2)

Checks if two graphs are isomorphic. This function will be extremely slow for large graphs.

Parallel

parallel_for(function, int_list)

Computes the result of a given function that takes an int argument in parallel given some list of ints over which to process.

Returns nothing, even if the passed function has a return value.

Input and Output

draw(g1, file_name, isymbols={}, osymbols={})

Draw a graph to an image. This function requires a working installation of Graphviz. Arc labels are of the format ilabel/olabel:weight. If the output symbols are not specified then the olabel is omitted and arc labels are of the format ilabel:weight. If the input symbols are not specified then integer ids are used as the label.

The format of the image is determined by the file_name extension and can be any dot supported extension (check with dot -T?).

Parameters
  • graph (Graph) – The graph to draw

  • file_name (str) – The name of the file to write to

  • isymbols (dict) – A map of integer ids to strings used for arc input labels

  • osymbols (dict) – A map of integer ids to strings used for arc output labels

load(file_name)

Load a graph from a file. The first two lines contain the list of space separated start and accept nodes respectively. The following lines contain the arcs in the format:

srcNode dstNode ilabel [olabel=ilabel] [weight=0.0]

where [x=y] indicate optional values for x with a default value of y.

For example:

0
1
0 1 1

is a two node graph with an arc from start node 0 to accept node 1 with input and output label of 1,

0
1
0 1 1 2

is a two node graph with an arc from node 0 to node 1 with input label 1 and output label 2, and

0
1
0 1 1 2 3.0

is a two node graph with an arc from node 0 to node 1 with input label 1, output label 2, and a weight of 3.0.

write_dot(g, file_name, isymbols={}, osymbols={})

Write the graph in Graphviz DOT format. See draw().

epsilon: int

Use the epsilon constant to refer to \(\epsilon\) transitions.

gtn.Graph

class Graph(calc_grad=True)

A weighted finite-state acceptor or transducer.

Parameters

calc_grad (bool) – Specify if a gradient is required for the graph.

add_node(start=False, accept=False)

Add a node to the graph.

Parameters
  • start (int) – Mark the node as a starting state

  • accept (int) – Mark the node as an accepting state

Returns

The integer id of the node.

Return type

int

add_arc(src_node, dst_node, label)

Add an accepting arc to the graph.

Parameters
  • src_node (int) – The id of the source node

  • dst_node (int) – The id of the destination node

  • label (int) – Label for the arc

Returns

The integer id of the arc.

add_arc(src_node, dst_node, ilabel, olabel, weight=0.0)

Add a transducing arc to the graph.

Parameters
  • src_node (int) – The id of the source node

  • dst_node (int) – The id of the destination node

  • ilabel (int) – Input label for the arc

  • olabel (int) – Output label for the arc

  • weight (float) – Weight for the arc

Returns

The integer id of the arc.

arc_sort(olabel=False)

Sort the arcs entering and exiting a node by label.

This function is intended to be used prior to calls to intersect() and compose() to improve the efficiency of the algorithms.

Parameters

olabel (bool) – Sort by increasing order on the arc input label (default) or output label if olabel == True.

mark_arc_sorted(olabel=False)

Mark the arcs entering and exiting the nodes of a graph as sorted. This method is intended to be used when a graph is constructed in sorted order to avoid paying for a call to arc_sort().

Parameters

olabel (bool) – Mark as sorted by input label (default) or output label if olabel == True.

item()

Get the weight on a single arc graph.

num_arcs()

Get the number of arcs in the graph.

num_nodes()

Get the number of nodes in the graph.

num_start()

Get the number of starting nodes in the graph.

num_accept()

Get the number of accepting nodes in the graph.

weights()

Get a pointer to an array of the graph’s arc weights.

weights_to_list()

Get a list of the graph’s arc weights.

weights_to_numpy()

Get a 1D numpy.ndarray with the graph’s arc weights.

set_weights(weights)

Set all of the arc weights of the graph.

Parameters

weights (int or list or numpy.ndarray) – The weights of the arcs to set. An int type is treated as the pointer to the first entry of an array of weights.

labels_to_list(ilabel=True)

Get the graph’s arc labels as a list.

Parameters

ilabel (bool) – If True return the input labels, otherwise return the output labels.

calc_grad: bool

Set to True to compute the gradient for the graph.

grad()

Access the graph’s gradient Graph.

zero_grad()

Clear the graph’s gradient.

Indices