Data Feeds

One of the most important considerations in working with an RTB bidder is designing the data pipeline by which auction, bid, and win events are processed and analyzed.

Unlike DSPs, with the Beeswax Bidder-as-a-Service technology you can access the full, unfiltered stream of RTB events, in much the same way you would if you were building a bidder from scratch.

Streaming vs Batch

The first consideration for designing your data pipeline is whether you prefer to get the data in batch form, or as a continuous stream. For the very high-volume data, like auction logs, Beeswax only supports batch data. For win logs (impressions) Beeswax supports both methods, and there are pros and cons to each:

Pipeline Method	Description	Pros	Cons
Batch	Hourly or daily files of data placed in an S3 bucket.	Fairly easy to ingest, fault tolerant	Delay in utilizing data. Also may include many files written per hour.
Stream	Near real-time data in JSON or protobuf format sent over http or to AWS Kinesis	Use data as fast as you can process it	Higher cost and complexity to support data ingestion

🚧
Data De-Duplication
Beeswax Data Infrastructure uses an "at least once delivery" design pattern to ensure all events are eventually delivered to customers. In certain scenarios this may mean that duplicative data is sent in logs to customers.
We always recommend de-duplicating your log-level data on auction_id or conversion_id in the case of conversion logs.

Data Definitions

Column definitions, protobuf mapping, field lists and a data dictionary can be found in the is publicly-accessible directory on Github: Beeswax Log File Header Definitions.

Beeswax makes multiple types of data available from Stinger as described in the chart below. Based on your use case you may need some, or all of this data. Because some of this data can be quite large, additional fees may apply (contact your Account Manager for more information).

A more comprehensive description of these log types and implementation details can be found in this publicly-accessible Readme on Github: Beeswax Log Summary.

Data Type	Description	Batch Field Manifest	Column Definitions and Protobuf Mapping
Auctions	The auction request from the exchange, normalized to OpenRTB fields.	auction_log_headers.csv
Bids	The bids returned from the Bidding Agent to the exchange, whether the auction was won or not.	bid_log_headers.csv
Conversions	The conversions recorded by Beeswax	conversion_log_headers.csv
Attributed Conversions	The conversions recorded by Beeswax, attributed back to an auction	attributed_conversion_log_headers.csv
IP Attributed Conversions	The IP conversions recorded by Beeswax, attributed back to an auction	attributed_ip_conversion_log_headers.csv
Losses	Loss logs provided by a limited number of exchanges (Google)	bid_response_feedback_logs.csv
Wins	The winning auctions (impressions), clicks, and events (video plays, etc)	win_log_headers.csv	ad_log.proto
Segments	1st party segment available on the auction	segment_log_headers.csv
Ghost Wins	The predicted winning auctions (Ghost Impressions) that have resulted from a Ghost Bid	ghost_win_log_headers.csv
Ghost Attributed Conversions	The conversion events that have been attributed back to a Ghost Impression	ghost_attributed_conversion_log_headers.csv
Ghost IP Attributed Conversions	The IP conversion events that have been attributed back to a Ghost Impression	ghost_attributed_ip_conversion_log_headers.csv

Schema Changes

Our rapid development cycle and the needs of our customers mean we often add new fields with little forewarning. In order to deliver value as quickly as possible, we do not release log-level changes on a set release schedule and new fields may be added at any time.

As a result, we recommend your ingestion pipelines are setup to handle for the addition of new columns at any time to avoid disruption of service.

That said, we will make all efforts to inform customers of any breaking changes to the logic of existing fields. Similarly, the ordinal positions of fields and header names will not change. Deprecated fields will not be removed in order to preserve column positions.

Our documentation of new fields is typically updated day of release, and typically within 2-3 business days at most.

CSV Formatting Notes

When emitting logs in .csv file format, Beeswax escapes certain special characters. While we mostly follow the RFC-4180 standard for CSV files, there are some small deviations from the specification that we do not follow. Most notably, in Win, Attributed Conversion and Loss Logs we use \ as the escape character for embedded double quote (in contrast to escaping double quotes with an additional double quote character) and comma characters. Most standard CSV parsers will allow the escape character to be adjustable.

Additionally, Win Logs, Attributed Conversion Logs and Loss Logs always enclose fields in double quotes, while all other log types only enclose fields in double quotes to handle for commas in the value of the field.

Streaming vs Batch

🚧Data De-Duplication

Data Definitions

Schema Changes

CSV Formatting Notes

🚧
Data De-Duplication