Law Insider API

Get direct access to your own Private Contract Repository data and the Law Insider SEC Repository. Run custom queries, extract data, perform analysis, and join with external data.

  • Direct SQL access to contracts and documents uploaded to your Private Contract Repository
  • 7.5M+ classified documents extracted from SEC filings
  • Millions of clauses, definition and semantic labels, across thousands of document categories

Introduction

Automate data capture at scale to reduce document processing costs with Law Insider Private Contract Repositories and the Law Insider API.

Take advantage of the same Machine Learning and Natural Language Processing pipeline that powers LawInsider.com to drive new insights from directly accessing the data hidden in your documents at scale.

The API provides access to data from millions of documents in structured tables that you can use standard SQL for easy querying and data analysis.

Use cases:

  • Use SQL to query the same SEC Repository that is published on LawInsider.com or upload your own documents by creating a Private Contract Repository.
  • Automatically organize documents into one of thousands of categories.
  • Compare clauses and definitions across all of your agreements for consistency.
  • Drive innovation with the unique and proprietary data extracted from your own documents.
  • Merge documents from multiple different document sources into one canonical repository or create many repositories for different groupings of documents.
  • Join document data with other analytics data.
  • Perform search over clause snippets, paragraphs or the full document text.

Data Processing

Document processing begins once they have been uploaded to a Private Contract Repository. Law Insider uses Machine Learning, OCR and Natural Language Processing to add structure to the extracted data in documents. Queryable structured tables are automatically created with no need to configure or set-up.

The following diagram is a high-level depiction of the Law Insider data processing and publishing pipeline:

diagram

Data Publishing

The Law Insider API uses Google Cloud Platform's BigQuery (video) for secure data publishing.

BigQuery is Google's serverless data warehouse that is designed to help you turn big data into valuable business insights.

Access your data in BigQuery by using the Google Cloud console, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Node.js, Java, .NET, or Python.

The following diagram shows adding a linked dataset to your project and querying it.

diagram

Table Schema

A sample of the full documents table schema described below (with data from 1,000 documents - one per row) is available for review as a Connected Sheet.

The asterisk (*) is used on fields that are generated with Machine Learning.

Fields that begin with sec_ will only be populated when querying the SEC Repository. All other fields exist for both the SEC Repository and Private Contract Repositories.

Column nameData typeValue
repository_idSTRINGRepository id is generated when creating a Private Contract Repository on lawinsider.com.
doc_idSTRINGUnique hash assigned to the document that is used in lawinsider.com URLs.
doc_category*STRINGMachine learning classified document category. E.g. Employment Agreement
doc_is_contract*STRINGMachine learning classified bit set to TRUE when the document is a contract or agreement. This field should be used in conjunction with the doc_category and clauses fields for the highest quality.
doc_filenameSTRINGOriginal document filename at time of processing. E.g. my_agreement.pdf
doc_languageSTRINGMachine learning classied language document is written in. E.g. en, fr
doc_source_urlSTRING(OPTIONAL) The URL address of the document.
doc_headSTRINGThe first 124 characters that appear in the document.
doc_bodySTRINGHTML of the full text document.
definitions*RECORDDefinition and title of a defined term. Numbered by order of occurrence within document.
clauses*RECORDClause title and clause snippet. Numbered by order of occurrence within document.
paragraphsRECORDFull text document split into paragraphs. One per newline. Numbered by order of occurrence.
sec_exhibit_idSTRINGSEC exhibit id e.g. EX-10
sec_filing_dateSTRINGDate filing was submitted to the SEC
sec_filing_idSTRINGFiling ID
sec_filing_typeSTRINGVersion number of SEC public forms
sec_company_cikINTEGERThe Central Index Key (CIK) is a ten digit number assigned by the SEC to each entity that submits filings.
sec_company_nameSTRINGThis corresponds to the name of the legal entity registered under the Investment Company Act of 1940. (Entered by Filer)
sec_company_sicINTEGERThe Standard Industrial Classification Codes that appear in a companys disseminated EDGAR filings indicate the companys type of business.

Query Examples

To get you started, SQL examples are provided below. These examples will work with both the SEC Repository and with a Private Contract Repository.

Our first example is how to query for the category of the document. The categories in this result set are the same categories that are used for Law Insider Sample categories.

Aggregate by Category Count

SELECT
  doc_category,
  COUNT(doc_category) AS doc_category_count
FROM sec.documents
WHERE doc_is_contract = TRUE
GROUP BY 1
ORDER BY 1 DESC

(Connected Sheets query results)

List clauses that appear in sales agreements

SELECT
  doc_id AS doc_id,
  clauses.id AS clause_id,
  clauses.title.text AS clause_title,
  clauses.snippet.text AS clause_snippet,
  doc_category AS doc_category,
FROM
  sec.documents,
  UNNEST (clauses) AS clauses
WHERE TRUE
  AND clauses.title.text IS NOT NULL
  AND clauses.title.text != ''
  AND doc_is_contract = TRUE
  AND LOWER(doc_category) LIKE '%sales agreement%'
ORDER BY doc_id ASC, clause_id ASC

(Connected Sheets query results)

The clause_id is a count that starts at 1 within each document. If the clause_id is higher than another clause_id in the same document, it appears after it in the document.

Find different definitions for the same terms across 100s or 1000s of documents.

Uncover hidden risk hidden among multiple versions of a critical term.

Find different definitions for the same terms across your documents.

SELECT
    definitions.title.text AS definition_title,
    definitions.snippet.text AS definition_snippet,
    COUNT(*) AS definition_count
  FROM
    sec.documents,
    UNNEST (definitions) AS definitions
  WHERE TRUE
    AND definitions.title.text IS NOT NULL
    AND definitions.title.text != ''
  GROUP BY 1, 2

(Connected Sheets query results)

Generate URL to document hosted on lawinsider.com.

SELECT
  CONCAT("https://www.lawinsider.com/", repository_id, "/", doc_id) AS url,
  doc_head,
  doc_category,
  sec_company_name,
FROM
  {repository_id}.documents;

(Connected Sheets query results)

SEC Repository

In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009.

Every day, Law Insider downloads new documents from the SEC and processes them into the SEC Repository. The added structure to the data present in the documents and contracts provided by Law Insider creates new opportunities to uncover insights and perform novel analysis.

BigQuery has also created a SEC Public Dataset that is focused on financial reporting details. Use the sec_company_cik column to join data from the SEC Repository to a companies financial reporting data.

If you use the python-edgar library, you will find join keys for each row in the master index. https://pypi.org/project/python-edgar/

Select 10-K documents from CHEVRON that are contracts.

SELECT
  CONCAT("https://www.lawinsider.com/", repository_id, "/", doc_id) AS url,
  doc_category,
FROM sec.documents
WHERE doc_is_contract=TRUE
  AND sec_filing_type LIKE '10-K'
  AND LOWER(sec_company_name) LIKE '%chevron%'

Join Law Insider SEC Repository with the BigQuery SEC Public Dataset

WITH
  li_docs AS (SELECT * FROM sec.documents)
SELECT
  sq.*,
  li_docs.*,
FROM
  `bigquery-public-data.sec_quarterly_financials.submission` sq
JOIN
  li_docs
ON
  li_docs.sec_company_cik=sq.central_index_key;

Get Standard Industry Classification (SIC) title for each company

WITH
  li_docs AS (SELECT * FROM sec.documents)
SELECT
  sc.industry_title,
  li_docs.sec_company_name,
  li_docs.doc_title,
FROM
  `bigquery-public-data.sec_quarterly_financials.sic_codes` sc
JOIN
  li_docs
ON
  CAST(li_docs.sec_company_sic AS STRING)=sc.sic_code;

(Connected Sheets query results)

Create URL to document hosted at sec.gov.

SELECT
  CONCAT("https://www.sec.gov/Archives/edgar/data/", company_cik, "/", REPLACE(filing_id, "-", ""), "/", doc_filename) AS url,
  doc_head,
  doc_category,
  sec_company_name,
FROM
  sec.documents

(Connected Sheets query results)

Get Started

To run your first query, follow these steps:

  1. Create a
  2. Create a Private Contract Repository

    Note that the repository ID is used to create a unique reserved URL for your repository. It will also be used as the linked dataset name in BigQuery.

    Create new repository
  3. Upload documents to the new Private Contract RepositoryUpload documents from google drive, local device or dropbox.

    Once you have selected which documents to upload you are ready to continue.

    Upload documents from google drive, local device or dropbox.
  4. Wait to receive uploads to xxxx repository have been processed email from Law Insider.Screenshot of email received after documents processed.

    After documents have been processed, you will see aggregate count totals in the lawinsider.com UI, for the repository next to each section.

    View of private contract repository statistics
  5. Now switch to the Google Cloud console to set-up BigQuery.

    In the Google Cloud console, open the BigQuery page.

    You can also open BigQuery in the Google Cloud console by entering the following URL in your browser:

    Open BigQuery in Google Cloud console

    https://console.cloud.google.com/bigquery
    
  6. Authenticate with your Google Account, or create one if you have not already.
  7. If you agree to the terms of service, accept them.
  8. Follow the prompts to create a Google Cloud project. You must create a Cloud project.
  9. After you create a Cloud project, request access from Law Insider and a link will be sent to you to subscribe to.
  10. Open Analytics Hub From BigQuery Console UIAnalytics Hub Side Panel Link Shown Highlighted
  11. Enable Analytics Hub API in your projectAnalytics Hub Enablement
  12. To query the SEC Repository, enter the following URL in your browser and click ADD DATASET TO PROJECT button.

    Private Dataset Listing for the SEC Repository

    https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/100730745850/locations/us/dataExchanges/sec/listings/sec
    

    Private Dataset Listing for the SEC Repository

  13. Adding the dataset to your project creates a linked dataset in your project that references the shared dataset that Law Insider publishes data to. When documents in the Private Contract Repository have been updated, then Law Insider will publish an update to the Shared dataset that is linked in your project.Create linked dataset
  14. Linked Dataset added to BigQueryLinked Dataset added to BigQuery
  15. Start querying documents tableQuery documents table

Extract

In addition to querying the Repository data directly in BigQuery, it may be exported to an open format such as AVRO, CSV, JSON, PARQUET to Google cloud storage. There are many ways to extract data from BigQuery and load it to a connected database or convert to open file format, see extract documentation.

Request API Access

API access is FREE for Private Contract Repositories and requires a repository with at least 1,000 uploaded documents.

If you are looking to acquire API access to Law Insider SEC repository, please email sales@lawinsider.com

Request Access