# Dynamic Pivot Mechanism [Bergemann and Välimäki]

In this post, I’m going to discuss the paper on Dynamic Pivot Mechanism, by Dirk Bergemann and Juuso Välimäki, Econometrica, Vol. 78, No. 2 (March, 2010), 771–789. The focus of this post will be more on understanding the mathematical details associated with the definitions and proofs of the mechanism. This short report summarizes the following: the motivation and setting of the problem, the details and the conclusion. Please feel free to submit comments pointing out errors, questions or adding some details. I will skip the efficient exit and diverse preferences section and related results for making the post short and detailed. I’ll try to understand those parts and summarize my understandings in a later post. Nevertheless, you are encouraged to put in comments on those parts.

This setting is a dynamic repeated auction of a single indivisible item in which each bidder learns over time her true valuation of the object. The setting is suitable for online ad auctions or dynamic decision making where one needs to select a single agent or set of agents for a task and make payment.

1. The Model

The environment consists of private and independent values in a discrete-time, infinite-horizon model. The agent set is ${\{1, \dots, I\} \triangleq [I]}$, the time instant ${t \in \mathbb{N}}$. Utility of agent is determined by the action ${a_t \in A}$ taken by the social planner. Private types of the agent ${i}$ at time ${t}$ is ${\theta_{i,t} \in \Theta}$. For the details of the notations, please see beginning of section 2 of the paper. The same notations will be followed throughout this post. Note that the utility of the agent in such a dynamic setting does not only depend on the current allocation, but the entire sequence of allocation in the future. I will refer this sequence of allocation as a policy, a term borrowed from the MDP literature.

2. Assumptions

• Infinite discounted payoff with discount factor ${\delta \in (0,1)}$.
• Markovian evolution of the types.
• Types independent across the agents.

3. Definitions

One of the main motivation of writing this blog is to properly understand the expectation terms in the paper. In many places the conditioning random variables are not very clear since the authors have suppressed them. Let us define the policy ${\pi_t}$ as the sequence of actions starting from time ${t}$, i.e., ${\pi_t = (a_t, a_{t+1}, \dots)}$. Denoting the values of each agent by ${v_i}$, the social welfare can be written as,

$\displaystyle \begin{array}{rcl} \lefteqn{W(\theta_t)} \\ &=& \max_{\pi_t} \mathbb{E}_{\pi_t | \theta_t} \left[ \sum_{s = t}^{\infty} \delta^{s-t} \sum_{i = 1}^I v_i(a_s, \theta_{i,s})\right] \\ &=& \max_{\pi_t} \mathbb{E}_{\pi_t | \theta_t} \left[ \sum_{i = 1}^I v_i(a_t, \theta_{i,t}) + \delta \sum_{s = t+1}^{\infty} \delta^{s-t-1} \sum_{i = 1}^I v_i(a_s, \theta_{i,s})\right] \\ &=& \max_{\pi_t} \left[ \mathbb{E}_{a_t | \theta_t} \left(\sum_{i = 1}^I v_i(a_t, \theta_{i,t})\right) + \delta \mathbb{E}_{\pi_t | \theta_t} \left(\sum_{s = t+1}^{\infty} \delta^{s-t-1} \sum_{i = 1}^I v_i(a_s, \theta_{i,s})\right)\right] \end{array}$

Now, let us focus on the second term above. The expection is over the distribution ${p(\pi_t | \theta_t)}$, which can be written as,

$\displaystyle \begin{array}{rcl} p(\pi_t | \theta_t) &=& p(a_t, \pi_{t+1} | \theta_t) \\ &=& p(a_t | \theta_t) p(\pi_{t+1} | a_t, \theta_t) \\ &=& p(a_t | \theta_t) \sum_{\theta_{t+1}} p(\theta_{t+1}, \pi_{t+1} | a_t, \theta_t) \\ &=& p(a_t | \theta_t) \sum_{\theta_{t+1}} p(\theta_{t+1} | a_t, \theta_t) p(\pi_{t+1} | \theta_{t+1}, a_t, \theta_t) \\ &=& p(a_t | \theta_t) \sum_{\theta_{t+1}} p(\theta_{t+1} | a_t, \theta_t) p(\pi_{t+1} | \theta_{t+1}) \quad \mbox{Markovian} \\ \end{array}$

Hence, we can write,

$\displaystyle \begin{array}{rcl} \lefteqn{W(\theta_t)} \\ &=& \max_{\pi_t} \left[ \mathbb{E}_{a_t | \theta_t} \sum_{i = 1}^I v_i(a_t, \theta_{i,t}) + \delta \mathbb{E}_{a_t | \theta_t} \mathbb{E}_{\theta_{t+1} | a_t, \theta_t} \mathbb{E}_{\pi_{t+1} | \theta_{t+1}} \sum_{s = t+1}^{\infty} \delta^{s-t-1} \sum_{i = 1}^I v_i(a_s, \theta_{i,s})\right] \\ &=& \max_{a_t} \mathbb{E}_{a_t | \theta_t} \left[ \sum_{i = 1}^I v_i(a_t, \theta_{i,t}) + \delta \mathbb{E}_{\theta_{t+1} | a_t, \theta_t} \underbrace{\max_{\pi_{t+1}}\left(\mathbb{E}_{\pi_{t+1} | \theta_{t+1}} \sum_{s = t+1}^{\infty} \delta^{s-t-1} \sum_{i = 1}^I v_i(a_s, \theta_{i,s}) \right)}_{W(\theta_{t+1})} \right] \\ &=& \max_{a_t} \mathbb{E}_{a_t | \theta_t} \left[ \sum_{i = 1}^I v_i(a_t, \theta_{i,t}) + \delta \mathbb{E}_{\theta_{t+1} | a_t, \theta_t} W(\theta_{t+1}) \right] \end{array}$

Now we understand the equations better. The allocation ${a_t}$ that maximizes the above quantity is called the efficient allocation and is denoted by ${a^*(\theta_t)}$. Similarly, ${W_{-i}(\theta_t)}$ has been defined excluding agent ${i}$ and running the efficient allocation. The marginal contribution is defined as,

$\displaystyle M_i(\theta_t) = W(\theta_t) - W_{-i}(\theta_t)$

With these definitions we can now focus on the Dynamic Pivot Mechanism. Note that, the marginal contribution can be written as the following.

$\displaystyle M_i(\theta_t) = m_i(\theta_t) + \delta \mathbb{E}_{\theta_{t+1} | a^*_t, \theta_t} [M_i(\theta_{t+1})]$

where,

$\displaystyle \begin{array}{rcl} \lefteqn{m_i(\theta_t)} \\ &=& W(\theta_t) - W_{-i}(\theta_t) - \delta \mathbb{E}_{\theta_{t+1} | a^*_t, \theta_t} \left(W(\theta_{t+1}) - W_{-i}(\theta_{t+1}) \right) \\ &=& \sum_{i = 1}^I v_i(a^*_t, \theta_{i,t}) \\ && - \sum_{j \neq i} v_j(a^*_{-i,t}, \theta_{j,t}) - \delta \mathbb{E}_{\theta_{t+1} | a^*_{-i,t}, \theta_t} W_{-i}(\theta_{t+1}) \\ && + \delta \mathbb{E}_{\theta_{t+1} | a^*_t, \theta_t} W_{-i}(\theta_{t+1}) \\ &=& \sum_{i = 1}^I v_i(a^*_t, \theta_{i,t}) - \sum_{j \neq i} v_j(a^*_{-i,t}, \theta_{j,t}) \\ && + \delta \left[ \mathbb{E}_{\theta_{t+1} | a^*_t, \theta_t} W_{-i}(\theta_{t+1}) - \mathbb{E}_{\theta_{t+1} | a^*_{-i,t}, \theta_t} W_{-i}(\theta_{t+1}) \right] \end{array}$

The rest of the calculations in section 4 follow easily from the definitions of ${a^*(\theta_t)}$ till we reach the theorem. The allocation ${a^*(\theta_t)}$ and the payment ${p_i^*(\theta_t) \triangleq v_i(a^*(\theta_t), \theta_{i,s}) - m_i(\theta_t)}$ together consists the dynamic pivot mechanism. It can be easily seen that,

$\displaystyle \begin{array}{rcl} \lefteqn{p_i^*(\theta_t) = \sum_{j \neq i} [v_j(a^*(\theta_{-i,t}), \theta_{j,t}) - v_j(a^*(\theta_t), \theta_{j,t})]} \\ && + \delta \left[ W_{-i}(\theta_{t+1} | a^*(\theta_{-i,t}), \theta_{t}) - W_{-i}(\theta_{t+1} | a^*(\theta_{t}), \theta_{t}) \right] \end{array}$

4. Results

Theorem 1 (Dynamic Pivot Mechanism) The dynamic pivot mechanism is ex-post incentive compatible and individually rational.

Proof: The payment is carefully chosen, such that the utility to agent ${i}$ turns out to be the marginal contribution. So, it remains to be shown that the marginal contribution is maximized at the true value of ${\theta}$. Also, notice that by the definition of the marginal contribution, it is non-negative, hence the mechanism is individually rational. The utility of player ${i}$ consists of her (valuation – payment) at instant ${t}$ + the expected discounted marginal future contribution. Mathematically,

$\displaystyle \begin{array}{rcl} u_i(a^*(\theta_t), p_i^*(\theta_t), \theta_{i,t}) &=& v_i(a^*(\theta_t), \theta_{i,s}) - p_i^*(\theta_t) + \delta \mathbb{E}_{\theta_{t+1} | a^*(\theta_t), \theta_t} [M_i(\theta_{t+1})] \\ &=& W(\theta_t) - W_{-i}(\theta_t) \end{array}$

For ex-post incentive compatibility, we need to show the following ${\forall \ r_{i,t}}$,

$\displaystyle \begin{array}{rcl} && u_i(a^*(\theta_t), p_i^*(\theta_t), \theta_{i,t}) \geq u_i(a^*(r_{i,t}, \theta_{-i,t}), p_i^*(r_{i,t}, \theta_{-i,t}), \theta_{i,t}) \\ &\Leftrightarrow& \\ \lefteqn{W(\theta_t) - W_{-i}(\theta_t)} \\ &\geq& v_i(a^*(r_{i,t}, \theta_{-i,t}), \theta_{i,t}) - p_i^*(r_{i,t}, \theta_{-i,t}) + \delta \mathbb{E}_{\theta_{t+1} | a^*(r_{i,t}, \theta_{-i,t}), \theta_t} \left(W(\theta_{t+1}) - W_{-i}(\theta_{t+1}) \right) \\ &\Leftrightarrow& \\ \lefteqn{W(\theta_t) - W_{-i}(\theta_t)} \\ &\geq& \sum_{j = 1}^I v_j(a^*(r_{i,t}, \theta_{-i,t}), \theta_{j,t}) + \delta \mathbb{E}_{\theta_{t+1} | a^*(r_{i,t}, \theta_{-i,t}), \theta_t} [W(\theta_{t+1})] - W_{-i}(\theta_{t}) \\ &\Leftrightarrow& \\ \lefteqn{W(\theta_t)} \\ &\geq& \sum_{j = 1}^I v_j(a^*(r_{i,t}, \theta_{-i,t}), \theta_{j,t}) + \delta \mathbb{E}_{\theta_{t+1} | a^*(r_{i,t}, \theta_{-i,t}), \theta_t} [W(\theta_{t+1})] \end{array}$

Which is true by choice of ${a^*(\theta_t)}$. Where the third inequality comes by plugging in the expression of ${p_i^*(r_{i,t}, \theta_{-i,t})}$. Remember,

$\displaystyle \begin{array}{rcl} \lefteqn{p_i^*(\theta_t) = \sum_{j \neq i} [v_j(a^*(\theta_{-i,t}), \theta_{j,t}) - v_j(a^*(\theta_t), \theta_{j,t})]} \\ && + \delta \left[ W_{-i}(\theta_{t+1} | a^*(\theta_{-i,t}), \theta_{t}) - W_{-i}(\theta_{t+1} | a^*(\theta_{t}), \theta_{t}) \right] \\ \mbox{Hence,}&& \\ \lefteqn{p_i^*(r_{i,t}, \theta_{-i,t}) = \sum_{j \neq i} [v_j(a^*(\theta_{-i,t}), \theta_{j,t}) - v_j(a^*(r_{i,t}, \theta_{-i,t}), \theta_{j,t})]} \\ && + \delta \left[ W_{-i}(\theta_{t+1} | a^*(\theta_{-i,t}), \theta_{t}) - W_{-i}(\theta_{t+1} | a^*(r_{i,t}, \theta_{-i,t}), \theta_{t}) \right] \end{array}$

Hence, ex-post incentive compatibility result follows. Q.E.D.

5. Conclusion

This paper is a major contribution to dynamic mechanism design. Unlike the static mechanisms, here we need to account for the infinite discounted payoff for each agent. This paper shows one scheme such that the participation constraints are met and it is best response for the agents to report their true types at each instant. However, no comment has been made to the computational issues of the payment. This is important since at each stage of the game, one needs to solve an MDP to find the optimal ${a^*}$.

Advertisements