Bid Selection Under Uncertainty – An Explanation

Engineering Blog

David Tolpin
7 min read


The purpose of managing bids is to maintain optimum balance between profit and traffic volume. Achieving this balance is the pain of traffic acquisition managers everywhere.

But how does one know how much to bid? How does one tell the AI algorithm how to manage each campaign, especially considering the multitude of factors which affect calculating the optimal bid?

Here’s how PubPlus came up with an automated solution to that exact question:

When we bid on impressions, we want to be as close as possible to a pre-set ROI value, while avoiding bids which lead to a loss. Let us assume that the ROI value is given and reflects the bidder’s preferences about the volume and profitability of the campaign — we will not try to optimize it here.

Bidding background

The basic formula for computing the bid is:

$$\widehat {bid} = \frac {RPM} {1000 \cdot ROI} PPS$$

We’ll denote \(\alpha = \frac {PPS} {1000}\), \( \rho = ROI\) and for the rest we’ll write

$$\widehat {bid} = \frac \alpha \rho \cdot RPM$$

However, this formula would be correct if we knew RPM exactly.

Bidding under uncertainty

In practice, we predict RPM based on past history and other variables, and get an uncertain estimate of RPM, represented by a distribution. As a simple safety measure, we can just bid below the value given by the above formula by a certain margin.

Alternatively, we can use a model of bidding under uncertainty based on belief distribution of the RPM and the utility function of a bid relative to the RPM. We choose the shape of the utility function to reflect the importance of staying close to the chosen ROI, as well as our reluctance to occasionally lose money.

When the RPM value is not known exactly, we predict a distribution instead of the exact value. Let \(f\) and \(cf\) be the probability density and the cumulative probability of RPM’s distribution. When we assume that RPM is distributed normally, we predict the sufficient statistics of the distribution, that is, the mean \(\mu\) and the standard deviation \(\sigma\).

Our model of uncertainty

To choose the best bid under uncertainty about the value of RPM, we need to define the utility u(bid|RPM) of a bid as a function of RPM. Then, the best bid \(bid^∗\) is one that maximizes the expected utility \(eu\).

$$bid^* = \arg \max_{bid} \int_{-\infty}^\infty u(bid|x) f(x) dx$$

We choose a simple form of \(u(\cdot)\) that reflects the conflicting objective of adhering to the ROI and avoiding losing bids. We specify the utility as a sum of two terms: the intrinsic utility \(v\) of being close to the ROI and the penalty \(w\) of bidding at loss. We choose:

• a Gaussian kernel of width \(\zeta\) for intrinsic utility;
• a constant penalty \(C\) for losing bids.

$$u(bid) = v(bid)\;\mathbf{if}\; bid \le \alpha \cdot RPM$$
$$\quad\quad\quad= v(bid) – w(bid) \;\mathbf{otherwise}$$
$$v(bid) = \exp\left(- \frac {(bid – \frac \alpha \rho \cdot RPM) ^ 2} {2 \zeta^2}\right)$$
$$w(bid) = C$$

Other shapes can be chosen for \(v\) and \(w\). Those above are both simple to understand and easy to manipulate with mathematics, so we’ll stick to them here. The expected utility is thus:

$$\mathrm{eu}(bid) = \int_{-\infty}^\infty v(bid|x) f(x) dx – C \cdot cf\left(\frac {bid} \alpha\right)$$

For normally distributed RPM and the chosen form of \(u\), this becomes:

$$\mathrm{eu}(bid) = \frac {\rho \sigma \zeta \exp\left( – \frac {(\rho \cdot bid – \alpha\mu)^2} {2(\rho^2\zeta^2 + \alpha^2\sigma^2)}\right)} {\sqrt{\rho^2\zeta^2 + \alpha^2 \sigma^2}} – \frac C 2 \left(1 + \mathrm{erf}\left(\frac {(\frac {bid} \alpha – \mu)} {\sqrt 2\sigma}\right) \right)$$

Maximizing the above expression gives the best bid. If the best bid is less than or equal to zero, then there is no safe bid, and we should either rethink our utility function or refrain from bidding until we know better.

Making it work in practice

To find the best bid online efficiently, we take the derivative of the expected utility and find the bid value which takes the derivative to zero. The formula for the derivative is brought here for convenience:

$$\frac {d\,\mathrm{eu}(bid)} {d \, bid} = \frac {\rho^2(\rho \cdot bid – \alpha\mu)\sigma\zeta \exp \left(- \frac {(\rho \cdot bid – \alpha\mu)^2} {2(\rho^2\zeta^2 + \alpha^2\sigma^2)}\right)} {(\rho^2\zeta^2 + \alpha^2\sigma^2)^{\frac 3 2}} – \frac C {\sqrt {2\pi} \sigma} \exp\left(- \frac {(\frac {bid} \alpha – \mu)^2} {2\sigma^2} \right)$$

Here are some values of the best bid for different penalties and uncertanties.

Setting \(\zeta = 0.1\), \(\alpha = 0.7\), \(\mu = 1\) we get

\(\sigma\) \(C\) bid
0.5 1 0.43
0.2 10 0.57
0.2 1 0.66
0.2 0 0.70

As one can see, the higher the uncertainty (\(\sigma\)) or penalty for losing money (\(C\)) the lower the recommended bid.

David Tolpin
As the Head of our Data Science department, David likes to say he does two things well - write cool programs that work, from design to support, and conduct cutting-edge AI research, from idea to publication.