Flock Safety FOIA Data

Posted 2025-07-28

This post represents a high-level overview of the methods and dataset, and we will follow up over time on the details

Introduction
Building the dataset
Data cleaning
- Coding the user-provided Reason
- Identifying the state where the organization is located
Results
Downloads and further reading

Introduction

The hard-working folks at MuckRock have made several requests for data related to the use of Flock Safety systems around the United States. In this post we discuss how we aggregated these data and what we found when we read through the records.

Our code is available in our Gitlab repository. The aggregated data are presented in our dashboard for interactive exploration.

Our data analysis process is as follows:

Read the supporting documentation
Collect the raw data
Aggregate the raw data into a single SQL table
Generate useful columns based on analysis of the raw columns
Examine the aggregate dataset to identify trends and outliers

Building the dataset

MuckRock provides an API for getting data from their site, so we start by fetching all documents related to FOIA requests from their site. These can be downloaded as a single archive, or you can retrieve an updated version yourself using python foia.py pull.

The dataset consists of two primary tables of interest: requests made by a local law enforcement agency, and requests made by anyone in the Flock Safety network which included data collected by that law enforcement agency. These are provided as files named *Organization_Audit* and *Network_Audit* respectively, in mixture of csv, xlsx, and csv/xlsx in zip archives. Each file represents requests from a given time span, typically around a month.

Some of the FOIA requests include a file *Shared_Networks* which explictly lists which networks the local organizaiton has opted to share their data with. We mostly ignore this table for now since it is largely covered by the actual requests made for data.

The table consists of the folowing columns:

Name: either as "A. Bcd" for network data or the full name of the searcher for organization data
Org Name: usually the actual name of the organization but not always, see the discussion. Null for Organization Audit files since the organization is implicitly the one named in the request.
Total Networks Searched (integer)
Total Devices Searched (integer)
Time Frame: "timestamptz\ntimetstamptz" for the range of time represented by the request
Case #: free-text input
Filters: limits applied to the search, e.g. "California" for California plates. There can be multiple filters, space-delimited
Reason: free-text justification for the search
License Plate: free-text input of the search
Search Time: UTC timestamp
Prompt: Some form of search term, e.g. "objectClass:vehicle RED FORD MUSTANG WITH PARTIAL LICENSE PLATE OF ESW"
Moderation: if present, "allow"

Most of the user-provided data is free-form text input so we will need to clean it before analysis.

Not all of the tables provided include all of the columns listed, but generally they include everything but the Prompty and Moderation columns.

Data cleaning

There are two columns we care about which are not normaalized: Reason, and License Plate. Cleaning the license plate column is a matter of stripping whitespace and applying .upper(). The Reason column takes a bit more work.

Coding the user-provided Reason

The user-provided Reason column is not null, but does not necessarily provide consistent or useful data. Some samples of that data: + ++ , ,, ,,, ,. - -- --- ---- ... 10-18 Hall of fame 1019 1019105 101915 10192 101921025 101932 1019369 101949 101977 101-9884 10199 101cam4test 101 chamberas 101 chambers 101 Chambers Rd 10-1 cpd 10-1 CPD 10-1 ev 25123124415 ... Federal Fugitive Investigation federal homicide investigation-281D-CG-6786880 FEDERAL INDICTMENT INVESTIGATION Federal Investgiation federal investiation federal investig federal investigation federal Investigation Federal investigation Federal Investigation federal narcotics investigation Fed Fugitive Investigation FED FUGITIVE INVESTIGATION fed inv Fed Invest fed investigaiton fed investigation Fed investigation Fed Investigation FED INVESTIGATION ... intellgence invetigation intelligence investigation Intelligence Investigation Intelligence Investigations (CIUA/RTCC Only) INTEMIDATION INVEST INTEMIDATION INVEST. INTEMIDATION INVESTIGATION interdiction inv interdiction INV interdiction invest Interdiction invest INTERDICTION INVEST interdiction investigation interdiction/investigation Interdiction investigation Interdiction/investigation INTERDICTION INVESTIGATION INTERDICTION/INVESTIGATION interdiction/investigations INTERDICTION/INVESTIGATIVE interdiction invs interdict weapon smuggler investigation interd inv Internal Affairs Investigation Internal Inv Internal Inv. Internal invest. internal investigation Internal Investigation Internal Investigation internal investigation 902 internal investigation - city vehicle

So our task here is to do the following:

Perform basic cleaning by stripping whitespace, trailing puncutuation, etc
Creating a map of reason to a set of codes which group similar reasons
Using that map to create a "reason_coded" column which attempts to characterize the raw "Reason" column

To do this, we use OpenRefine to load our dataset and attempt to cluster reasons (Column > Edit Cells > Cluster and edit...). This involves a number of judgement calls, so we store the list of mappings in our repository and update it as needed. In the actual SQL database the coding is done through a materialized view to a "reason_coded" column, so we can always compare the reason with its coded value directly.

Some key decisions:

All single-character and punctuation-only reasons are mapped to "None"
agg → aggravated
*warrant* → warrant

In general we favor grouping requests into 50-100 shared categories to simplify assessment.

Identifying the state where the organization is located

We also create a "state" column by matching capitalized two-character codes in the Org Name to state names, and full-text state names to their character codes. During this we also make sure that organizations like "HIDTA" do not map to the incorrect value, see "state.py" if you want to learn more.

Results

Size of the network

By taking the max of the networks_searched and devices_searched we find there must be at least 7212 distinct networks and 92502 distinct cameras installed.

Ratio of organization to national search

We find 56230 searches from organization audits and 29.3M from the network audits.

In other words, for every one search performed by a local law enforcement agency, their data is queried 521 times.

Reasons for searches

"Investigation" is given 29% of the time, and "None" another 1.6%. Stolen vehicles and narcotics investigations are the next most popular.

Ultimately we find that a large percentage of cases are grouped easily into a broad category like the ones listed above, but there is a massive long tail of cases which are annotated with a case number, a specific reason for the search, or something else equivalently precise. The spirit of this column is to have a clear reason for why the search was performed, and it seems that this is followed in perhaps 25% of cases. A cursory examination shows that some organizations are more likely to consistently define their search reasons than others.

That said, since we are looking at searches which hit the national system, by definition we are seeing the least precise and most expansive searches. So we cannnot really conclude much about the organizations who are not directly audited in these requests, other than to note that they collectively produce tens of millions of national requests per year, mostly with minimal justification.

Additionally, many of the reasons given involve cooperation ("assist/assistance") with a federal agency. It would be worthwhile to look further and determine whether this is a legitimate use of the database.

Organizations

The full list of organizations found in these FOIA requests

A few things crop up that deserve some attention. First, there is a category "Deactivated Users" where the names of the searchers appear in other organizations. This suggests that Flock is not providing an auditable dataset representing values as they existed at the time, but instead dynamically modifying the results as the users change.

This is unacceptable in a regulated environment and we need to understand this category better

Downloads and further reading

Home
News
Live ALPR map
ALPR Database
Gitlab
Contact