# Advan Data Dictionaries

Documentation for Advan's mobility and transaction data products.

---

# Weekly Patterns+ Data Dictionary

**Version:** v2.8

## Overview

Weekly Patterns+ data provides the same foot traffic data insights as Monthly Patterns+ on a weekly basis, tracking data from Monday to the end of day on Sunday each week.

Weekly Patterns+ are available starting from **January 1st, 2018**. Data is delivered three days after the week ends (on Wednesday), providing more frequent actionable data.

**Note:** Weekly Patterns+ includes both US and Canada foot traffic data.

---

## Schema

**File Names:** `patterns_plus_*.csv.gz`

| Column Name | Description | Type | Example |
|-------------|-------------|------|---------|
| `id_store` | Unique ID tied to this point of interest ("POI"). This ID is *guaranteed to be persistent* across the lifespan of the company, excluding M&A events. See `persistent_id_store` for M&A handling. | Integer | `6255608` |
| `ticker` | The exchange-traded ticker, if available, or a custom ticker for private companies and subsidiaries. | String | `COST` |
| `persistent_id` | Unique, human-readable identifier *guaranteed to be persistent even across M&A events* that represents this specific company. | String | `A1_UPS` |
| `persistent_id_store` | Persistent ID tied to this POI. *Guaranteed to be persistent* across the lifespan of the location, including M&A events. | Integer | `6255608` |
| `footprint_id` | Unique identifier for the physical polygon of the store location. | Integer | `1234567891234567891` |
| `is_distributor` | Whether this is a company owned location (null) or a distribution location for the company's products (1). | Integer | `null` |
| `location_name` | The name of the point of interest. | String | `Salinas Valley Ford Lincoln` |
| `street_address` | Street address of the point of interest. | String | `1100 Auto Center Circle` |
| `city` | The city of the point of interest. | String | `Irvine` |
| `region` | The state, province or county of the point of interest. | String | `CA` |
| `postal_code` | The postal code of the place of interest. | String | `92602` |
| `iso_country_code` | The 2 letter ISO 3166-1 alpha-2 country code. | String | `US` |
| `brand` | Company DBA ("doing business as") name if explicitly identified. | String | `UPS` |
| `open_date` | The date the store location opened. 1970-01-01 denotes locations opened before 1/1/2010. | YYYY-MM-DD | `1970-01-01` |
| `close_date` | The date the store location closed. 2038-01-01 denotes locations that are still open. | YYYY-MM-DD | `2038-01-01` |
| `longitude` | Longitude for POI's location. | Float | `-112.4556` |
| `latitude` | Latitude for POI's location. | Float | `38.3441` |
| `naics_code` | North American Industry Classification System Code. | String | `722511` |
| `top_category` | 4 digit NAICS description. | String | `Restaurants and Other Eating Places` |
| `sub_category` | 6 digit NAICS description. | String | `Full-Service Restaurants` |
| `poi_cbg` | The census block group (U.S.) or dissemination area (Canada) the POI is located within. | String | `560610112022` |
| `msa_code` | 5-digit Metropolitan Statistical Area. | String | `22300` |
| `date_range_start` | Start time for measurement period in ISO 8601 format (local time). The start time will be 12 a.m. Monday in local time. | String | `2020-03-02T00:00:00-06:00` |
| `date_range_end` | End time for measurement period in ISO 8601 format. The end time will be the following Monday at 12 a.m. local time. | String | `2020-03-09T00:00:00-06:00` |
| `visit_counts` | Number of estimated visits (sum of each day's unique visitors across all days in the date range) to this POI. | Integer | `1542` |
| `visitor_counts` | Number of unique estimated visitors to this POI during the date range. | Integer | `1221` |
| `visits_by_day` | The number of estimated visits to the POI each day (local time), Monday through Sunday. | JSON [Integer] | `[33, 22, 33, 22, 33, 22, 22]` |
| `visits_by_each_hour` | The number of estimated visits to the POI for each of the 168 hours of the week, starting at midnight on date_range_start. | JSON [Integer] | `[33, 22, 33, 22, 33, 22, 22, 21, 23...]` |
| `visitor_home_cbgs` | Estimated visitors from each census block group (U.S.) or dissemination area (Canada) based on home location. **Trade Area** | JSON {String: Int} | `{"360610112021": 603}` |
| `visitor_home_aggregation` | Estimated visitors from each census tract (U.S.) or aggregate dissemination area (Canada) based on home location. **Trade Area** | JSON {String: Int} | `{"17031440300": 1005}` |
| `visitor_daytime_cbgs` | Estimated visitors based on primary daytime location on weekdays (M-F) between 8am-6pm. **Trade Area** | JSON {String: Int} | `{"360610112030": 9872}` |
| `visitor_country_of_origin` | Estimated visitors from each country based on visitor's home country code. | JSON {String: Int} | `{"US": 98, "CA": 12}` |
| `distance_from_home` | Median distance from home travelled by visitors (of visitors whose home we have identified) in meters. | Integer | `1211` |
| `median_dwell` | Median minimum dwell time in minutes. | Double | `5` |
| `bucketed_dwell_times` | Distribution of visit dwell times by pre-specified buckets (key is range in minutes, value is estimated visits). | JSON {String: Int} | `{"<5": 40, "5-20": 22}` |
| `related_same_day_brand` | Other brands visitors went to on the same day as the visit to this POI. Top 20. | JSON {String: Int} | `{"mcdonalds": 7}` |
| `related_same_week_brand` | Other brands visitors went to in the same week as the visit to this POI. Top 20. | JSON {String: Int} | `{"target": 3}` |
| `device_type` | Estimated visitors using Android vs. iOS. | JSON {String: Int} | `{"android": 6, "ios": 8}` |
| `carrier_name` | Estimated visitors by wireless carrier of the device. **Premium** | JSON {String: Int} | `{"Verizon": 342, "T-Mobile": 288}` |
| `Reserved1-5` | Reserved for backwards compatibility with legacy feeds. Currently set to NULL. | Float | `NULL` |

> **Privacy Notice:** We do not report data if less than 2 visitors are observed from that group. If there are between 2 and 4 visitors, this is reported as 4. These numbers are before scaling to estimated visitors; the equivalent number of estimated visitors will be higher.

> **Trade Area Note:** For trade area computations we utilize additional panels, which may in some cases result in total sum of visitors by CBG to be higher than visitor_counts.

---

## SUCCESS and HISTORY RESTATED Flags

The following empty files are placed in the bucket(s) periodically:

- `_SUCCESS_PPLUS_WEEKLY_YYYYMMDD`: placed in the associated GCS or AWS bucket as soon as all the weekly files are ready.
- `_HISTORY_RESTATED_PPLUS_WEEKLY_YYYYMMDD`: placed in the associated GCS or AWS bucket whenever the historical data is restated, signaling the restatement is complete and ready to be loaded.

---

## Key Concepts

### Visit Attribution

We compute visits/visitors inside a POI using the POI's geometry. We do not apply any dwell time or any concept of "stops"; we rely on the polygon for accuracy. Testing on 1,500 publicly traded tickers versus top line revenue and credit card transaction counts has shown that filtering for dwell time reduces the signal and makes the correlation/forecasting worse.

### Determining Home Location

We compute a device's home/work (night/day) location by computing the time a device spent in each building in the country; then taking the most frequented building.

### Understanding Visitation and Trade Area Panels

For visitations we use a panel which is consistent across time and can be normalized, in order to generate accurate year-over-year growth metrics. For trade areas (visitor_home_cbgs, visitor_home_aggregation, visitor_daytime_cbgs, visitor_country_of_origin) we use all the panels we are receiving, for which we have explicit permission to utilize for trade areas. This is a much larger panel, which intersects but does not fully contain the visitation panel, but it is more volatile week-over-week so therefore it should not be used for year-over-year comparisons. Advan recommends that users interpret these trade area values as ratios or scaled indicators rather than absolute numbers.

### Backfills

Backfill is when we take our most recent locations (addresses + geofences) and run our visit attribution algorithm backward in time to generate a new history of "backfilled" Patterns+. Backfills are typically generated every time new Advan POIs are added (typically monthly, with the exception of August and December).

### Weekly vs Monthly Patterns+ Differences

- Each delivery covers one week (Monday to Sunday), available three days later on Wednesday.
- Includes `visits_by_each_hour` column for 168-hour detailed view of the week.
- Does not include `popularity_by_hour` or `popularity_by_day` (covered by `visits_by_each_hour`).
- Includes Canada foot traffic data (Monthly Patterns+ is US only).
- Uses `related_same_week_brand` instead of `related_same_month_brand`.

---

## Column Name Details

### `street_address`

Title-cased, consistent, and friendly for human reading. We implement multiple steps to clean, validate and standardize addresses.

### `region`

- US (`iso_country_code == US`): state or territory
- CA (`iso_country_code == CA`): Canadian Province or territory
- GB (`iso_country_code == GB`): United Kingdom county

### `postal_code`

- US: 5 digit zip code
- CA: 3 digit Forward Sortation Area (FSA) + space + 3 digit Local Delivery Unit (LDU)
- GB: British postal code

### `visits_by_day`

Array of visits on each day in the week, Monday through Sunday. Days are broken up based on local time.

### `visits_by_each_hour`

Array of visits for each hour that exists in the week (168 hours). The metric reflects the total duration of a visit, meaning the same device may be counted in multiple hours if the visit spans across them. Days are broken up based on local time.

### `visitor_home_cbgs`

Home census block groups (U.S.) or dissemination areas (Canada) of the visitors. For each area, we show the number of associated *visitors* (not visits). If visits by home CBG is desired, multiply visitors from each CBG by the average visits/visitor (`visit_counts / visitor_counts`).

### `distance_from_home`

Median haversine distance between the visitor's home geohash-7 and the POI location for each visit. If fewer than 5 visitors, the value will be null. Each visitor is counted equally (not adjusted for visits).

### `median_dwell`

Median of the minimum dwell times calculated for each visit. Minimum dwell is determined by looking at the first and last ping from a device during a visit. It is possible to have a minimum dwell of 0 if we only saw 1 ping and determined the visit based on factors such as wifi.

### `bucketed_dwell_times`

Bins: `{ "<5", "5-10", "11-20", "21-60", "61-120", "121-240", ">240" }` (in minutes)

### `related_same_day_brand` / `related_same_week_brand`

Brands that visitors also visited on the same day/week. The value is a simple percent of POI visitors that visited the other brand. Only the first 20 brands are returned.

---
---

# Neighborhood Patterns+ Data Dictionary

**Version:** v1.8

## Overview

Footfall data aggregated by census block group (CBG) in the U.S. and dissemination area (DA) in Canada. Ideal for **site-selection use cases** and understanding area activity patterns.

Normalizes observed data to estimate actual counts across the adult population. Available starting from **January 1st, 2019**.

---

## Schema

**File Names:** `neighborhood_patterns_plus*.csv.gz`

| Column Name | Description | Type | Example |
|-------------|-------------|------|---------|
| `area` | The designation of the area. Canadian DAs have a "CA:" prefix. | String | `131000000000` |
| `area_type` | The type of area (Census Block Group for both US and CA). | String | `Census Block Group` |
| `date_range_start` | Start of measurement period in ISO 8601 format (local time). | String | `2020-03-01T00:00:00-06:00` |
| `date_range_end` | End of measurement period in ISO 8601 format (local time). | String | `2020-04-01T00:00:00-06:00` |
| `day_counts` | Frequency of each day of the week in the date range. | JSON {String: Int} | `{"Monday": 4}` |
| `stop_counts` | Estimated stops of the adult population (min 1 minute duration). | Integer | `428840` |
| `device_counts` | Unique devices in the adult population that stopped in this area. | Integer | `109161` |
| `stops_by_day` | Number of stops each day over the covered time period. | JSON [Integer] | `[33, 22, 33...]` |
| `device_home_areas` | Devices that stopped in this area by home origin area. **Privacy Applied** | JSON {String: Int} | `{"360610112021": 603}` |
| `weekday_device_home_areas` | Devices that stopped Monday through Friday. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `weekend_device_home_areas` | Devices that stopped Saturday or Sunday. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `breakfast_device_home_areas` | Devices that stopped 6am - 10:59am. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `lunch_device_home_areas` | Devices that stopped 11am - 2:59pm. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `dinner_device_home_areas` | Devices that stopped 5pm - 8:59pm. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `nightlife_device_home_areas` | Devices that stopped 9pm - midnight. **Privacy Applied** | JSON {String: Int} | `{"130890212162": 91}` |
| `work_behavior_device_home_areas` | Work hours devices that dwelled 6+ hours (excludes home visits). **Privacy Applied** | JSON {String: Int} | `{"130890212162": 42}` |
| `distance_from_home` | Median distance from home in meters. | Integer | `29385` |
| `median_dwell` | Median dwell time in minutes (min 1 minute stops only). | Double | `5` |
| `top_same_day_brand` | Brands visited same day (top 20, percentage values). | JSON {String: Int} | `{"mcdonalds": 7}` |
| `popularity_by_hour_[day]` | 24-element array of stops for each hour on that day of week. **Privacy Applied** | Array | `[9877, 8233...]` |
| `device_type` | Devices using Android vs. iOS. **Privacy Applied** | JSON {String: Int} | `{"android": 6, "ios": 8}` |
| `region` | US state/territory or Canadian Province/territory. | String | `MD` |
| `y` | Year of measurement period (for easier filtering). | Integer | `2021` |
| `m` | Month of measurement period (for easier filtering). | Integer | `6` |

> **Differential Privacy Notice:** A random number (0-5 in US, 0-3 in Canada) is added/subtracted before adjusting to adult population. Single-device areas are not reported; 2-4 devices are reported as 4. Starting January 2023 (US), only 65th percentile of areas are included.

---

## Optional: Panel Overview Data

**File:** `home_panel_summary.csv`

---

## Key Concepts

### Visit Attribution

Uses POI geometry without dwell time filtering. Tested against 1,500 publicly traded tickers and credit card transaction counts.

### Determining Home Location

Computed by analyzing time spent in each building and selecting the most frequented.

### Stop Definition

A stop is defined as a device being present in a census block group for at least 1 minute. This captures meaningful activity while filtering out pass-through traffic.

### Time-Based Segmentation

The data provides multiple time-based views of activity:

- **Weekday vs Weekend:** Understand different patterns between work days and leisure days
- **Meal Times:** Breakfast (6am-10:59am), Lunch (11am-2:59pm), Dinner (5pm-8:59pm)
- **Nightlife:** Evening activity from 9pm to midnight
- **Work Behavior:** Identifies areas where devices dwell 6+ hours during work hours (excluding home visits)

### Privacy Protections

Multiple privacy measures are applied:

- Random noise (0-5 in US, 0-3 in Canada) added before scaling
- Single-device areas not reported
- 2-4 devices reported as 4
- Starting January 2023, only 65th percentile of areas included (US)

---

## Use Cases

### Site Selection

Neighborhood Patterns+ is ideal for understanding foot traffic patterns at the census block group level, helping businesses:

- Identify high-traffic areas for new locations
- Understand the trade areas of potential sites
- Analyze competition and complementary businesses nearby
- Evaluate daytime vs nighttime activity levels

### Market Analysis

- Compare activity levels across different neighborhoods
- Understand seasonal and day-of-week patterns
- Identify underserved areas with growth potential

---
---

# SpendView Data Dictionary

**Version:** v1.1

## Overview

Credit and debit card transaction data sourced from over **120 million cards**, representing an active user panel of about **40 million users** (15% of the US population).

### Validated Against Ground Truth

Our spend data closely tracks published company revenue. Figure 1 demonstrates this by comparing Advan SpendView data against Chipotle's reported financials on a year-over-year basis.

![Figure 1: Advan Chipotle Spend vs Chipotle published Revenue](CMG%20FQ%20yoy.png)

**Figure 1:** Advan Chipotle Spend vs Chipotle published Revenue on a year-over-year basis. The chart shows near-perfect correlation between Advan's estimated spend growth and Chipotle's actual reported revenue growth from March 2020 through September 2023, validating our data quality and methodology.

---

### Key Features

| Feature | Description |
|---------|-------------|
| **5,000+ Brands** | Large coverage, growing monthly |
| **Online & Brick-and-Mortar** | Differentiates transaction channels |
| **Stable Panel** | Largest available, consistent over time |
| **7+ Years History** | From 1/1/2019, including pre-Covid |

---

## Data by Brand, Period, and Customer Home Area

**File Names:** `brand_by_customer_[area_type]_by_period_YYYYMMDD_*.csv.gz`

| Field | Description | Type | Example |
|-------|-------------|------|---------|
| `period_type` | D (daily), W (weekly), M (monthly), Q (quarterly), Y (yearly), YTD, TTM. | String | `M` |
| `start_dt` | Period start date in YYYY-MM-DD format. | String | `2024-11-01` |
| `end_dt` | Period end date in YYYY-MM-DD format. Day after last day of period. | String | `2024-12-01` |
| `brand_name` | The name of the brand. | String | `WEGMANS` |
| `[area_name]` | Customer residence area: zip, DMA, division, region, city, county, MSA, state_code. | String | `92341` |
| `channel` | A (ALL), O (ONLINE), or B (Brick and Mortar). | String | `ALL` |
| `spend` | Estimated amount spent. | Float | `$5,057,959.34` |
| `trans_count` | Estimated number of transactions ("card swipes"). | Integer | `74,749` |
| `customers` | Estimated number of customers. | Integer | `65,684` |
| `spend_N` | Spend by basket size bucket. N: 6, 10, 14, 16, 25, 50, 75, 100, 150, 200, 250, 500, 1000, 10000. | Float | `$53,335.67` |
| `trans_count_N` | Transactions by basket size bucket. Same buckets as spend_N. | Integer | `1,234` |

### Basket Size Buckets

The `_N` suffix indicates the upper bound of the basket size bucket:
- `spend_6`: Transactions up to $6
- `spend_10`: Transactions $6.01 to $10
- `spend_14`: Transactions $10.01 to $14
- ... and so on up to `spend_10000`

> **Important:** Area Level Data is NOT Additive. Data is removed for privacy if fewer than 5 actual cards were used. Use data aggregated to your desired level.

---

## Data by Sector (Separate License)

Sector data by NAICS and SIC codes. Same structure as brand data with `sector_type` and `sector` fields instead of `brand_name`.

**File Names:** `sector_by_customer_[area_type]_by_period_YYYYMMDD_*.csv.gz`

| Field | Description | Type | Example |
|-------|-------------|------|---------|
| `sector_type` | Type of sector classification (NAICS or SIC). | String | `NAICS` |
| `sector` | The sector code. | String | `722511` |
| *(other fields)* | Same as brand data (period_type, start_dt, end_dt, area, channel, spend, trans_count, customers, etc.) | - | - |

---

## Ranking Data (Separate License)

Ranking of brands within sectors by spend, transactions, customers, basket size, and frequency.

**File Names:** `rank_by_customer_[area_type]_by_period_YYYYMMDD_*.csv.gz`

| Field | Description | Type |
|-------|-------------|------|
| `spend` | Ranking of amount spent (1 = highest) | Integer |
| `trans_count` | Ranking of transaction count | Integer |
| `customers` | Ranking of customer count | Integer |
| `avg_basket` | Ranking of average basket size | Integer |
| `frequency` | Ranking of spend frequency in period | Integer |
| `sector_cnt` | Number of brands in this sector | Integer |

### Additional Fields in Ranking Data

The ranking data also includes:
- All identifier fields from brand data (period_type, start_dt, end_dt, brand_name, area, channel)
- Sector classification fields (sector_type, sector)

---

## Production Data Delivery

Files are produced weekly (Wednesdays) and deposited in AWS:

```
s3://advan-spendview-brand-customer-period-us-east1/
s3://advan-spendview-sector-customer-period-us-east1/
s3://advan-spendview-rank-customer-period-us-east1/
```

### Success Flag

`_SUCCESS_WEEKLY_YYYYMMDD` file indicates all period files are ready for that delivery.

---

## Period Types Explained

| Period Type | Description | Example |
|-------------|-------------|---------|
| `D` | Daily | Single day |
| `W` | Weekly | Monday to Sunday |
| `M` | Monthly | Calendar month |
| `Q` | Quarterly | Calendar quarter |
| `Y` | Yearly | Calendar year |
| `YTD` | Year-to-Date | Jan 1 to current date |
| `TTM` | Trailing Twelve Months | Rolling 12-month period |

---

## Channel Definitions

| Channel | Code | Description |
|---------|------|-------------|
| All | `A` | Combined online and brick-and-mortar transactions |
| Online | `O` | E-commerce and digital transactions |
| Brick and Mortar | `B` | In-store, physical location transactions |

---

## Geographic Area Types

Data is available aggregated by various geographic levels:

| Area Type | Description |
|-----------|-------------|
| `zip` | 5-digit ZIP code |
| `city` | City name |
| `county` | County name |
| `state_code` | 2-letter state code |
| `MSA` | Metropolitan Statistical Area |
| `DMA` | Designated Market Area |
| `division` | Census division |
| `region` | Census region |

---

## Privacy Considerations

- Data is removed if fewer than 5 actual cards were used in a particular area/brand/period combination
- Area-level data should not be summed across areas to get totals (use the appropriate aggregation level instead)
- All values are estimates scaled from the panel to represent total market activity

---
---

## Contact

For questions about any of these data products, contact: **support@advan.us**

Website: [www.advanresearch.com](https://www.advanresearch.com)
