Flock Safety FOIA Data

Posted 2025-07-28

This post represents a high-level overview of the methods and dataset, and we will follow up over time on the details

Introduction

The hard-working folks at MuckRock have made several requests for data related to the use of Flock Safety systems around the United States. In this post we discuss how we aggregated these data and what we found when we read through the records.

Our code is available in our Gitlab repository. The aggregated data are presented in our dashboard for interactive exploration.

Our data analysis process is as follows:

Building the dataset

MuckRock provides an API for getting data from their site, so we start by fetching all documents related to FOIA requests from their site. These can be downloaded as a single archive, or you can retrieve an updated version yourself using python foia.py pull.

The dataset consists of two primary tables of interest: requests made by a local law enforcement agency, and requests made by anyone in the Flock Safety network which included data collected by that law enforcement agency. These are provided as files named *Organization_Audit* and *Network_Audit* respectively, in mixture of csv, xlsx, and csv/xlsx in zip archives. Each file represents requests from a given time span, typically around a month.

Some of the FOIA requests include a file *Shared_Networks* which explictly lists which networks the local organizaiton has opted to share their data with. We mostly ignore this table for now since it is largely covered by the actual requests made for data.

The table consists of the folowing columns:

Most of the user-provided data is free-form text input so we will need to clean it before analysis.

Not all of the tables provided include all of the columns listed, but generally they include everything but the Prompty and Moderation columns.

Data cleaning

There are two columns we care about which are not normaalized: Reason, and License Plate. Cleaning the license plate column is a matter of stripping whitespace and applying .upper(). The Reason column takes a bit more work.

Coding the user-provided Reason

The user-provided Reason column is not null, but does not necessarily provide consistent or useful data. Some samples of that data:
+
++
,
,,
,,,
,.
-
--
---
----
...
10-18 Hall of fame
1019
1019105
101915
10192
101921025
101932
1019369
101949
101977
101-9884
10199
101cam4test
101 chamberas
101 chambers
101 Chambers Rd
10-1 cpd
10-1 CPD
10-1 ev 25123124415
...
Federal Fugitive Investigation
federal homicide investigation-281D-CG-6786880
FEDERAL INDICTMENT INVESTIGATION
Federal Investgiation
federal investiation
federal investig
federal investigation
federal Investigation
Federal investigation
Federal Investigation
federal narcotics investigation
Fed Fugitive Investigation
FED FUGITIVE INVESTIGATION
fed inv
Fed Invest
fed investigaiton
fed investigation
Fed investigation
Fed Investigation
FED INVESTIGATION
...
intellgence invetigation
intelligence investigation
Intelligence Investigation
Intelligence Investigations (CIUA/RTCC Only)
INTEMIDATION INVEST
INTEMIDATION INVEST.
INTEMIDATION INVESTIGATION
interdiction inv
interdiction INV
interdiction invest
Interdiction invest
INTERDICTION INVEST
interdiction investigation
interdiction/investigation
Interdiction investigation
Interdiction/investigation
INTERDICTION INVESTIGATION
INTERDICTION/INVESTIGATION
interdiction/investigations
INTERDICTION/INVESTIGATIVE
interdiction invs
interdict weapon smuggler investigation
interd inv
Internal Affairs Investigation
Internal Inv
Internal Inv.
Internal invest.
internal investigation
Internal Investigation
Internal Investigation
internal investigation 902
internal investigation - city vehicle

So our task here is to do the following:

  1. Perform basic cleaning by stripping whitespace, trailing puncutuation, etc
  2. Creating a map of reason to a set of codes which group similar reasons
  3. Using that map to create a "reason_coded" column which attempts to characterize the raw "Reason" column
To do this, we use OpenRefine to load our dataset and attempt to cluster reasons (Column > Edit Cells > Cluster and edit...). This involves a number of judgement calls, so we store the list of mappings in our repository and update it as needed. In the actual SQL database the coding is done through a materialized view to a "reason_coded" column, so we can always compare the reason with its coded value directly.

Some key decisions:

In general we favor grouping requests into 50-100 shared categories to simplify assessment.

Identifying the state where the organization is located

We also create a "state" column by matching capitalized two-character codes in the Org Name to state names, and full-text state names to their character codes. During this we also make sure that organizations like "HIDTA" do not map to the incorrect value, see "state.py" if you want to learn more.

Results

Size of the network

By taking the max of the networks_searched and devices_searched we find there must be at least 7212 distinct networks and 92502 distinct cameras installed.

Ratio of organization to national search

We find 56230 searches from organization audits and 29.3M from the network audits.

In other words, for every one search performed by a local law enforcement agency, their data is queried 521 times.

Reasons for searches

"Investigation" is given 29% of the time, and "None" another 1.6%. Stolen vehicles and narcotics investigations are the next most popular.

Ultimately we find that a large percentage of cases are grouped easily into a broad category like the ones listed above, but there is a massive long tail of cases which are annotated with a case number, a specific reason for the search, or something else equivalently precise. The spirit of this column is to have a clear reason for why the search was performed, and it seems that this is followed in perhaps 25% of cases. A cursory examination shows that some organizations are more likely to consistently define their search reasons than others.

That said, since we are looking at searches which hit the national system, by definition we are seeing the least precise and most expansive searches. So we cannnot really conclude much about the organizations who are not directly audited in these requests, other than to note that they collectively produce tens of millions of national requests per year, mostly with minimal justification.

Additionally, many of the reasons given involve cooperation ("assist/assistance") with a federal agency. It would be worthwhile to look further and determine whether this is a legitimate use of the database.

Organizations

The full list of organizations found in these FOIA requests

A few things crop up that deserve some attention. First, there is a category "Deactivated Users" where the names of the searchers appear in other organizations. This suggests that Flock is not providing an auditable dataset representing values as they existed at the time, but instead dynamically modifying the results as the users change.

This is unacceptable in a regulated environment and we need to understand this category better

Downloads and further reading


Home
News
Live ALPR map
ALPR Database
Gitlab
Contact