Summer Program August 6-17, 2012: Computational Advertising

In cooperation with the Society for Industrial and Applied Mathematics (SIAM) and the SIAG/DMA.


E-commerce is pioneering new kinds of advertising and new kinds of customer service. These innovations pose novel challenges in statistics and mathematics. For example:

* A search engine company might write a contract to present an on-line advertisement to 2 million males in California between the ages of 20 and 40 in November, and another contract to present a different ad to 4 million males aged 20 to 60 in Southern California during November and December. The search engine company needs to decide how to divide up the demographic characteristics of the users for sale, how to price different combinations of characteristics, and how the price of a contract should depend upon the time horizon before the service is due (i.e., a demographically specific contract in the distant future may preclude many other advertisment sales, and the price should reflect that; in contrast, a contract for advertising tomorrow is almost pure revenue, provided that the demographic breakouts are available).

* Advertisements bid against each other for keywords. The statistical characteristics of these high-speed auctions is unclear, and there are strategies for handicapping the bidders in order to achieve economically desirable outcomes, or to develop better mechanism design.

* Recommender systems are a key component of search engine advertisement technology. All major recommender systems are proprietary (with a narrow exception for the Netflix competition), but the statistical strategies involved in these kinds of complex data mining problem are an important opportunity in the open research domain.

The mathematical challenges that arise in computational advertising include massive, high-speed linear programming, better agent-based models for auction dynamics, and the computational finance behind dynamic management of the sales portfolio. The statistical challenges include modeling and forecasting of trends among users, prediction methodology for recommender systems, and modeling the revenue streams.


This two-week program ran from August 6 to August 17, 2012. The first week was held at the Radisson RTP in Research Triangle Park, NC.  The location is in close proximity to SAMSI. The first three days were spent on technical presentations by leading researchers and industry experts, to bring everyone up to speed on the currently used methodology. On the fourth day, the participants self-organized into working groups, each of which addressed one of the key problem areas (it was permitted that people join more than one group, and the organizers try to arrange the working group schedules to faciliate that).  The second week was spent at SAMSI headquarters in Research Triangle Park.

The activities in the working groups addressed real-world datasets provided by the corporate participants. Datasets that were made available include:

Recommender systems data: (1) The Yahoo! music data; (2) Yahoo! search marketing advertiser bid-impression-click data; (3) Yahoo! search marketing advertiser bidding data; (4) Yahoo! Front Page Today module user click log data

Web Search: (1) Learning to Rank competition data.

Advertising: (1) Click logs from a real-world system (to be released, tentative at this point).

e-commerce:( 1) Epinions: data to study user trust and product ratings.