{"component": "clause", "props": {"groups": [{"snippet": "For the visualization, the EEG was bandpass filtered between 0.3 and 40 Hz. For deep learning classifiers and cluster encoders, the EEG was bandpass filtered between 1 and 40 Hz, re-refer- enced to the common average, and normalized by dividing by the 99th percentile of the absolute amplitude. All filters were implemented in Python as 5th order Butterworth filters using scipy.signal (\u2587\u2587\u2587\u2587\u2587\u2587\u2587\u2587 et al., 2020) and zero-phase filtering.", "snippet_links": [{"key": "deep-learning", "type": "clause", "offset": [80, 93]}], "samples": [{"hash": "fkyjj2YKn58", "uri": "/contracts/fkyjj2YKn58#preprocessing", "label": "Pilot Study", "score": 24.0951403149, "published": true}, {"hash": "4jOkpQFLBBx", "uri": "/contracts/4jOkpQFLBBx#preprocessing", "label": "Pilot Study", "score": 24.0951403149, "published": true}], "size": 2, "hash": "29824a4e31b5a8c33365648f243f29fc", "id": 1}, {"snippet": "Before the Bayesian analysis, we cleaned the data and visualized general tendencies present in the data as summary plots using the tidyverse package system in R (\u2587\u2587\u2587\u2587\u2587\u2587\u2587 et al., 2019). In the data-cleaning process, we had several criteria for exclusion. The first criteria was participants\u2019 native language: we excluded participants whose native language is not Turkish. The second criteria was their accuracy in practice items: if they give wrong answers to more than half of the questions, we excluded them from the analysis. We also excluded participants that answered the questions too fast, that is below 200 milliseconds. Finally, we excluded participants with too many inaccurate answers in control conditions. We did not include missing data points or exclusions in our analysis and assumed that data were missing completely at random (\u2587\u2587\u2587 \u2587\u2587\u2587\u2587\u2587\u2587, 2018). In this thesis, we do not report the rates of missing data, but our raw data is available.", "snippet_links": [{"key": "the-data", "type": "clause", "offset": [41, 49]}, {"key": "criteria-for-exclusion", "type": "clause", "offset": [230, 252]}, {"key": "native-language", "type": "definition", "offset": [291, 306]}, {"key": "excluded-participants", "type": "clause", "offset": [311, 332]}, {"key": "in-practice", "type": "clause", "offset": [410, 421]}, {"key": "control-conditions", "type": "definition", "offset": [698, 716]}, {"key": "missing-data", "type": "definition", "offset": [737, 749]}, {"key": "rates-of", "type": "clause", "offset": [900, 908]}, {"key": "raw-data", "type": "clause", "offset": [931, 939]}], "samples": [{"hash": "6NX7RTZUcXV", "uri": "/contracts/6NX7RTZUcXV#preprocessing", "label": "Thesis Submission Agreement", "score": 31.5631905481, "published": true}, {"hash": "3UIUBSQVkpI", "uri": "/contracts/3UIUBSQVkpI#preprocessing", "label": "Thesis", "score": 25.629705681, "published": true}], "size": 2, "hash": "5041693d94c014840cb4f36dd1a8d4a1", "id": 2}, {"snippet": "Before inputting data into the generative network, some preprocessing is performed on the human trajectory dataset. This includes feature extraction and the creation of a structured dataset. Assuming that there are coordinates of each person\u2019s trajectory in the selected dataset, each one of these points should be able to be defined as a potential goal point depending on the time of observation of the trajectory. This claim is valid during training and inference of the model, with the aim being, the inclusion of trajectory mid-points across the overall trajectory. Although the goal points don\u2019t need to be explicitly defined in the dataset, it is important to be able to locate the people in the scene. By locating where people are in the scene, you can trace their trajectory through the environment and arbitrarily set a frame to such a state. Moreover, it is critical to be able to determine the kinds of information that can be used as input for the model. An individual\u2019s position in the scene can be one of the data points, but more can be extracted. Velocities can also be determined by looking at the distance travelled between frames.", "snippet_links": [{"key": "the-human", "type": "clause", "offset": [86, 95]}, {"key": "feature-extraction", "type": "clause", "offset": [130, 148]}, {"key": "each-person", "type": "clause", "offset": [230, 241]}, {"key": "the-selected", "type": "clause", "offset": [258, 270]}, {"key": "depending-on-the", "type": "clause", "offset": [360, 376]}, {"key": "the-model", "type": "clause", "offset": [469, 478]}, {"key": "inclusion-of", "type": "clause", "offset": [504, 516]}, {"key": "the-dataset", "type": "clause", "offset": [634, 645]}, {"key": "the-people", "type": "definition", "offset": [684, 694]}, {"key": "the-environment", "type": "clause", "offset": [791, 806]}, {"key": "determine-the", "type": "clause", "offset": [891, 904]}, {"key": "of-information", "type": "definition", "offset": [911, 925]}, {"key": "an-individual", "type": "clause", "offset": [967, 980]}], "samples": [{"hash": "bqpyE5WyKJX", "uri": "/contracts/bqpyE5WyKJX#preprocessing", "label": "Grant Agreement", "score": 33.7491278718, "published": true}], "size": 1, "hash": "59caa13edd6db7c0e172b955f9df4d31", "id": 3}, {"snippet": "In order to fuse the input data sets together, geolocate the transactions, and then create the origin/destination and transfer files, a number of preprocessing steps must first be performed. These steps standardize the geographic references and create a number of look-up tables that greatly speed the complex data processing. The following are the pre-processing tasks: \u2022 Import and standardize the AVL files \u2022 Create a stop location table \u2022 Create a table that correlates the ORCA transaction record\u2019s directional variable (i.e., inbound or outbound) with the cardinal directions used by the transit agency\u2019s directional variable (i.e., north/south/east/west) \u2022 Create (or update) the quarter-mile look-up table \u2022 Update the off-board stop location table \u2022 Preprocess ORCA transactions data and reformat date and time variables \u2022 Create a subsidy table \u2022 Link the subsidy table to ORCA cards (CSNs) \u2022 Hash the CSNs and Business IDs in the subsidy table, maintaining the link between the subsidy table and the hashed CSNs \u2022 Remove duplicate boarding records. These tasks are described below. Schema for each of the data sets are presented in Appendix C.", "snippet_links": [{"key": "in-order-to", "type": "clause", "offset": [0, 11]}, {"key": "data-sets", "type": "definition", "offset": [27, 36]}, {"key": "the-transactions", "type": "clause", "offset": [57, 73]}, {"key": "transfer-files", "type": "clause", "offset": [118, 132]}, {"key": "number-of", "type": "clause", "offset": [136, 145]}, {"key": "the-complex", "type": "definition", "offset": [298, 309]}, {"key": "data-processing", "type": "definition", "offset": [310, 325]}, {"key": "the-pre", "type": "clause", "offset": [345, 352]}, {"key": "transaction-record", "type": "clause", "offset": [483, 501]}, {"key": "transit-agency", "type": "definition", "offset": [594, 608]}, {"key": "date-and-time", "type": "clause", "offset": [806, 819]}, {"key": "the-subsidy", "type": "clause", "offset": [862, 873]}, {"key": "the-link", "type": "clause", "offset": [968, 976]}, {"key": "the-data", "type": "clause", "offset": [1112, 1120]}], "samples": [{"hash": "4raSAN4ARNx", "uri": "/contracts/4raSAN4ARNx#preprocessing", "label": "Research Report Agreement", "score": 24.7686491925, "published": true}], "size": 1, "hash": "03a462726afa947f3ce156f4a1a4a151", "id": 4}, {"snippet": "The log data are formatted as Apache log files.1 We fil- tered the raw data as follows: We removed all requests that did not result in a successful response (status codes starting DOI: \u2587\u2587\u2587\u2587://\u2587\u2587.\u2587\u2587\u2587.\u2587\u2587\u2587/10.1145/2911451.2914667 1See \u2587\u2587\u2587\u2587\u2587://\u2587\u2587\u2587\u2587\u2587.\u2587\u2587\u2587\u2587\u2587\u2587.\u2587\u2587\u2587/docs/1.3/logs.html for a def- inition of the format. with 3 or higher); all requests that are no GET requests; and all requests for images and other files that do not result from a navigational process.2 In addition, we removed all re- quests that supposedly come from web bots, using the regu- lar expression .*(Yahoo! Slurp|bingbot|Googlebot).* on the log entry. We anonymized the data by taking the fol- lowing measures: We replaced all occurrences of the same IP address by a unique random identifier (a 10-digit string). We removed the last part of each log entry \u2013 the User- Agent HTTP request header \u2013 which is the identifying information that the client browser reports about it- self.3 If the referrer is a search engine, we removed every- thing after the substring /search?. We are aware that queries can provide valuable information about pages in the domain [2], but queries are also known to po- tentially be personally identifiable information [1]; for that reason, we will postpone a decision on releasing fil- tered query information, and first gain experience with the external usage of the data without search queries. We removed requests for URLs that only occur once in the 3-month-dataset to reduce the chance of unmask- ing specific users. This is an additional security step since extremely low-frequent URLs are highly specific and therefore often unique for a person. The effect of each of the filtering steps is shown in Table 1. The information that is retained per entry is: unique user id, timestamp, GET request (URL), status code, the size of the object returned to the client, and the referrer URL. A \u2587\u2587\u2587- ple of the resulting data is shown in Figure 1. The sample illustrates that the content (URLs and referrers) is multi- lingual: predominantly Dutch, and English and German in smaller proportions.", "snippet_links": [{"key": "log-data", "type": "definition", "offset": [4, 12]}, {"key": "raw-data", "type": "clause", "offset": [67, 75]}, {"key": "status-codes", "type": "clause", "offset": [158, 170]}, {"key": "requests-for", "type": "clause", "offset": [376, 388]}, {"key": "other-files", "type": "clause", "offset": [400, 411]}, {"key": "in-addition", "type": "clause", "offset": [461, 472]}, {"key": "the-data", "type": "clause", "offset": [636, 644]}, {"key": "ip-address", "type": "definition", "offset": [721, 731]}, {"key": "identifying-information", "type": "definition", "offset": [879, 902]}, {"key": "search-engine", "type": "definition", "offset": [973, 986]}, {"key": "valuable-information", "type": "clause", "offset": [1080, 1100]}, {"key": "personally-identifiable-information", "type": "definition", "offset": [1179, 1214]}, {"key": "query-information", "type": "definition", "offset": [1289, 1306]}, {"key": "usage-of-the", "type": "clause", "offset": [1352, 1364]}, {"key": "additional-security", "type": "clause", "offset": [1530, 1549]}, {"key": "extremely-low", "type": "clause", "offset": [1561, 1574]}, {"key": "a-person", "type": "clause", "offset": [1640, 1648]}, {"key": "effect-of", "type": "definition", "offset": [1654, 1663]}, {"key": "table-1", "type": "clause", "offset": [1704, 1711]}, {"key": "the-information", "type": "clause", "offset": [1713, 1728]}, {"key": "unique-user", "type": "definition", "offset": [1760, 1771]}, {"key": "the-object", "type": "clause", "offset": [1831, 1841]}, {"key": "to-the-client", "type": "clause", "offset": [1851, 1864]}, {"key": "a-\u2587", "type": "clause", "offset": [1888, 1891]}, {"key": "resulting-data", "type": "definition", "offset": [1906, 1920]}, {"key": "figure-1", "type": "definition", "offset": [1933, 1941]}, {"key": "the-content", "type": "clause", "offset": [1971, 1982]}], "samples": [{"hash": "dIQuKCsYlho", "uri": "/contracts/dIQuKCsYlho#preprocessing", "label": "End User Agreement", "score": 26.106091718, "published": true}], "size": 1, "hash": "5a4317f217f9d2e435e8b0356ad1e20c", "id": 5}, {"snippet": "preprocessing steps are required. These steps standardize the geographic references and create a number of look-up tables that greatly speed the complex data processing. The following pre-processing tasks are performed: \u2022 Import and standardize AVL files, \u2022 Create stop location table \u2022 Update off-board stop location table \u2022 Create (or update) the quarter-mile look-up table \u2022 Create subsidy table \u2022 Link the subsidy table to ORCA cards (CSNs) \u2022 Hash the CSNs and Business IDs in the subsidy table, maintaining the link between the subsidy table and the hashed CSNs \u2022 Preprocess date and time values in the transaction data \u2022 Remove duplicate boarding records. Each of these tasks is described below.", "snippet_links": [{"key": "number-of", "type": "clause", "offset": [97, 106]}, {"key": "the-complex", "type": "definition", "offset": [141, 152]}, {"key": "data-processing", "type": "definition", "offset": [153, 168]}, {"key": "the-subsidy", "type": "clause", "offset": [406, 417]}, {"key": "the-link", "type": "clause", "offset": [512, 520]}, {"key": "date-and-time", "type": "clause", "offset": [580, 593]}, {"key": "transaction-data", "type": "definition", "offset": [608, 624]}], "samples": [{"hash": "4raSAN4ARNx", "uri": "/contracts/4raSAN4ARNx#preprocessing", "label": "Research Report Agreement", "score": 24.7686491925, "published": true}], "size": 1, "hash": "4af7453de3ae598a6d7e725cf9ceff02", "id": 6}, {"snippet": "The terrain in forests shows significant variations in height and contains substantial under-canopy vegetation. Our seg- mentation approach considers no semantics and is aimed solely at identifying trees. We preprocess an input point cloud with the aim of filtering out the ground, bushes, and any small near-ground structures. We first minimally denoise the cloud and apply the cloth simulation algorithm proposed by \u2587\u2587\u2587\u2587\u2587 et al. [45] to compute a ground segmentation. Their method inverts the z-axis of the point cloud P and simulates the interaction of a rigid cloth covering the inverted ground surface, extracting the set of ground points PG. For points p = [p , p , p ]\u22a4 \u2208 P and pi \u2208 PG, we Fig. 2: Results of ground segmentation and height normalization steps. In the top image, points in red denote identified ground points. The ground segmentation is used to normalize the height, as shown in the image below. density-based clustering algorithm [39]. Following is a brief summary of Quickshift++ while illustrating how we use it in the context of our problem. For more details, we refer the reader to the work by \u2587\u2587\u2587\u2587\u2587 et al. [14]. Let rk(p) for a point p \u2208 P be the distance of p to its k-th nearest neighbor. For the true density f (p) of a point p, the k-NN density estimate of it is defined as interpolate the ground elevation of a point h(p) as fk(p) = n v r (p)3 , (3) h(p) = \u03a3 \u03a3pi \u2208N w(p, pi)pi\n(1) k", "snippet_links": [{"key": "in-height", "type": "clause", "offset": [52, 61]}, {"key": "ground-surface", "type": "definition", "offset": [592, 606]}, {"key": "as-shown", "type": "definition", "offset": [890, 898]}, {"key": "brief-summary", "type": "clause", "offset": [975, 988]}, {"key": "the-context", "type": "clause", "offset": [1041, 1052]}, {"key": "more-details", "type": "clause", "offset": [1073, 1085]}, {"key": "the-work", "type": "definition", "offset": [1110, 1118]}, {"key": "ground-elevation", "type": "definition", "offset": [1323, 1339]}], "samples": [{"hash": "iW6zAPFHZmy", "uri": "/contracts/iW6zAPFHZmy#preprocessing", "label": "Grant Agreement", "score": 33.3703723569, "published": true}], "size": 1, "hash": "208a2aab5491d0727cd620d597283381", "id": 7}, {"snippet": "The simple protocol explained above uses the fact that Bob knows more about the value of \u2587\u2587\u2587\u2587\u2587 than \u2587\u2587\u2587 knows. In fact, one can show that a x y z PXYZ 11 1 1 1/4 Forget second bit H(X|Z) \u2014 H(X|Y) = 0 Send second bit u y z PUYZ\n1 1 0 1 4 1 1 1 1/4 x y z v PXYZV\n11 1 1 1 1 4 H(U|Z) \u2014 H(U|Y) = 1 H(X|ZV) \u2014 H(X|YV) = 1", "snippet_links": [{"key": "the-fact", "type": "clause", "offset": [41, 49]}, {"key": "the-value", "type": "clause", "offset": [76, 85]}], "samples": [{"hash": "2MvmPx2tdvD", "uri": "/contracts/2MvmPx2tdvD#preprocessing", "label": "Doctoral Thesis", "score": 23.372347707, "published": true}], "size": 1, "hash": "3f92bad7cd21674a582bf42c8ed4b0af", "id": 8}, {"snippet": "Preprocessing was performed in FMRIB\u2019s Software Library (FSL 5.0.9, \u2587\u2587\u2587\u2587\u2587\u2587\u2587\u2587\u2587 et al., 2012). The structural and functional MRI data were skull stripped. The functional data was registered to 2mm-MNI-standard space via the individual T1- weighted anatomical image (FLIRT). The functional data was motion corrected (MCFLIRT) and smoothed with a 6 mm Gaussian kernel. ICA-AROMA was used to filter out additional motion-related, physiologic, and scanner-induced noise while retaining the signal of interest (Pruim, \u2587\u2587\u2587\u2587\u2587\u2587, Buitelaar, et al., 2015; \u2587\u2587\u2587\u2587\u2587, \u2587\u2587\u2587\u2587\u2587\u2587, \u2587\u2587\u2587 \u2587\u2587\u2587\u2587\u2587, et al., 2015). White matter and cerebrospinal fluid signals were regressed out (Pruim, \u2587\u2587\u2587\u2587\u2587\u2587, \u2587\u2587\u2587 \u2587\u2587\u2587\u2587\u2587, et al., 2015; Varoquaux & \u2587\u2587\u2587\u2587\u2587\u2587\u2587\u2587, 2013). Lastly, a 128 s high-pass filter was applied to the data. To construct the functional RS connectome, we used the 264 regions of interest (ROIs) presented by \u2587\u2587\u2587\u2587\u2587 et al. (2011) which are based on a meta-analysis of resting state and task-based fMRI data (Figure 1A). These ROIs represent nodes of common networks such as the default mode network. Calculating the connectivity between all nodes allows us to include connectivity between nodes within the same network as well as connectivity between nodes of different networks. The ROIs were spheres with a radius of 5mm around the coordinates described by Power et al. (2011). For each participant, the signal within these spheres was averaged and normalized resulting in 264 time series. Functional connectivity was calculated by correlating each time series with every other time series resulting in a 264x264 correlation matrix and 34,716 unique connectivity estimates \u2013 representing the functional RS connectome (\u2587\u2587\u2587\u2587\u2587\u2587\u2587 et al., 2018; \u2587\u2587\u2587 et al., 2018). For further calculations the connectome was vectorized (i.e., transforming matrix into column vector; Figure 2A).", "snippet_links": [{"key": "functional-data", "type": "clause", "offset": [157, 172]}, {"key": "the-individual", "type": "clause", "offset": [218, 232]}, {"key": "the-data", "type": "clause", "offset": [767, 775]}, {"key": "to-construct", "type": "clause", "offset": [777, 789]}, {"key": "presented-by", "type": "definition", "offset": [863, 875]}, {"key": "based-on", "type": "clause", "offset": [906, 914]}, {"key": "calculating-the", "type": "clause", "offset": [1066, 1081]}, {"key": "each-participant", "type": "clause", "offset": [1350, 1366]}, {"key": "other-time", "type": "clause", "offset": [1540, 1550]}], "samples": [{"hash": "414eUOVxNZS", "uri": "/contracts/414eUOVxNZS#preprocessing", "label": "Doctoral Thesis", "score": 24.9685147159, "published": true}], "size": 1, "hash": "72df88df7c2b1e463b626f8cdb89a94f", "id": 9}, {"snippet": "Alphabet Size", "snippet_links": [], "samples": [{"hash": "2MvmPx2tdvD", "uri": "/contracts/2MvmPx2tdvD#preprocessing", "label": "Doctoral Thesis", "score": 23.372347707, "published": true}], "size": 1, "hash": "0ab77f35399ea5d2000e33a552f3ccb7", "id": 10}], "next_curs": "ClYSUGoVc35sYXdpbnNpZGVyY29udHJhY3RzcjILEhZDbGF1c2VTbmlwcGV0R3JvdXBfdjU2IhZwcmVwcm9jZXNzaW5nIzAwMDAwMDBhDKIBAmVuGAAgAA==", "clause": {"title": "Preprocessing", "children": [["labeling-mechanism", "Labeling mechanism"]], "size": 14, "parents": [["statistical-choices", "Statistical Choices"], ["satisfiability-and-completeness", "Satisfiability and completeness"], ["materials-and-methods", "Materials and methods"], ["causal-discovery-from-current-data", "CAUSAL DISCOVERY FROM CURRENT DATA"], ["building-the-generative-network", "Building the Generative Network"]], "id": "preprocessing", "related": [["subprocessing", "Subprocessing", "Subprocessing"], ["processing", "Processing", "Processing"], ["sub-processing", "Sub-Processing", "Sub-Processing"], ["cross-connection", "Cross Connection", "Cross Connection"], ["details-of-the-processing", "Details of the Processing", "Details of the Processing"]], "related_snippets": [], "updated": "2025-07-07T12:37:48+00:00", "also_ask": [], "drafting_tip": "", "explanation": "The Preprocessing clause defines the procedures and requirements for preparing data or materials before they are used in a subsequent process or analysis. Typically, this clause outlines the specific steps, standards, or formats that must be followed to ensure consistency and quality, such as cleaning data, converting file types, or removing sensitive information. Its core practical function is to ensure that all inputs meet agreed-upon criteria, thereby reducing errors and inefficiencies in later stages of a project or workflow."}, "json": true, "cursor": ""}}