CAUSAL DISCOVERY FROM CURRENT DATA Clause Samples

CAUSAL DISCOVERY FROM CURRENT DATA. In this chapter, we present our approach for causal discovery from data with a directional flow. First, we introduce the assumed data generating process. Second, we show how how to manipulate the data to subsequently apply an adapted version of the FCI algorithm. 3.1 Data generating process‌ ∈ We assume that the data is generated by the following process: Let X(k) = (X1(k), ..., XN (k)) be a set of N continuous variables measured at locations ▇ ▇. Locations are connected by currents which together form a spatial network (fig. 1a). (a) (b) Figure 1: An example of the type of system that we consider. It is described by two separate graphs: (a) a graph describing the spatial network, indicating relations between locations and (b) a graph describing the causal network, indicating relations between variables at the same location. We intentionally use different symbols in the two figures to stress their different function. This network is similar to a Markov chain as each location depends only on itself and the directly preceding locations in the direction of the current (Breiman (1992)). Importantly, locations are not influenced by locations further down the current. In contrast to a chain, however, the current arms of the network can split and unite. The entire set of locations directly preceding location k is denoted by Pre(k). In addition to the spatial network, the system is described by a causal network (fig. 1b). We assume spatial invariance, meaning that the causal network is shared among all locations k. The following definition describes the different types of variables in the system that we consider. Definition 1 Let X be a set of variables measured in a system with a directional current. Then there is a partitioning (I, O, R) with: 1. The subset I(k) = (I1(k), ..., InI (k)) of variables that are an effect of previous locations and a cause of subsequent locations (i.e., the variables that are affected by the current, e.g., chemical concentrations). 2. The subset O(k) = (O1(k), ..., OnO (k)) of variables that are exogenous to the system: These may be a cause of the other variables, but not an effect (e.g., riverside activities). Exogenous variables of different locations are assumed to be uncorrelated. 3. The subset R(k) = (R1(k), ..., RnR (k)) of variables that do not fit either of these categories (e.g., the substrate of a river). The spatial structure of the system is captured by an additional set of variables, constructed from po- tentially multiple...