Data Confidentiality Mid-Year Workshop
A Working Group in National Defense and Homeland Security at SAMSI

March 13, 2006

Hosted by

National Center for Health Statistics
Hyattsville, MD


Program

Monday, March 13, 2006

8:45 AM Welcome and Introductions
9:00 AM Session I: Mathematical tools for SDL
9:00 Mathematical Models for Data Confidentiality Problems
Lawrence H. Cox, NCHS
10:00 Break
10:30 Algebraic Statistics for Statistical Disclosure Limitation in Contingency Tables
Aleksandra B. Slavkovic, Penn State
11:15 A Stochastic Process Approach to the Analysis of Swapping for Categorical Variables
Lisa Denogean, SAMSI
11:45 Discussion, led by James Lynch, USC and SAMSI
12:15 PM Lunch
1:30 PM Session II: Inference and Quality
1:30 Effects of Rounding on Data Quality
Jay J. Kim, NCHS
2:15 Combinations of SDC methods for continuous microdata
Anna Oganian, NISS
2:45 New measures of data utility
Mi-Ja Woo, NISS
3:15 Break
3:45 Secure Statistics Software for Horizontally Partitioned Data
Francisco Vera, NISS and SAMSI
4:15 The importance of assessing risk-utility tradeoffs for SDL
Discussion, led by Jerome Reiter, Duke
4:45 PM Adjourn

Contact & Directions

Those interested in attending should contact one of the hosts:

Myron Katzoff, [email protected]
Lawrence H. Cox, [email protected]
Joe F. Gonzalez, [email protected]

Participants will need to fax their lunch selection to SAMSI, Attn: Nicole Scott at 919-685-9360

 

The meeting will be held at the headquarters for the National Center for Health Statistics:

National Center for Health Statistics
Metro IV Building
3311 Toledo Road
Hyattsville, Maryland 20782

The information telephone number is (301) 458-4000

 

Metro Bus and Rail Directions
Information about metro bus and rail service is accessible through the Washington Metro Area Transit Authority Web site.

From Dulles International Airport
Take the Dulles Airport Access Road to exit 9 (I-495 toward Frederick/Bethesda). Continue on I-495 toward Baltimore for about 18 miles and take exit 28B (New Hampshire Avenue/Takoma Park). At the second traffic light, turn left onto Adelphi Road. Continue on Adelphi Road to the seventh traffic light and turn right onto Toledo road. The Metro IV Building will be one block on your left, just past the parking garage.

From National Airport
Follow the signs marked "Washington" to exit the airport onto George Washington Parkway. Stay on the Parkway for about 12 miles and take the exit marked Maryland/I-495. Continue on I-495 and take exit 28B (New Hampshire Avenue/Takoma Park). At the second traffic light, turn left onto Adelphi Road. Continue on Adelphi Road to the seventh traffic light and turn right onto Toledo road. The Metro IV Building will be one block on your left, just past the parking garage.

From Baltimore-Washington International Airport
Take Interstate 195 to the Baltimore Washington Parkway (Route 295) south toward Washington. Take the exit marked Riverdale/Hyattsville/New Carrollton (about 20 miles). Turn right at the first traffic light (Riverdale Road/Route 410). At the 6th traffic light, turn right onto Adelphi Road. At the first intersection, turn left onto Toledo Road. The Metro IV Building will be one block on your left, just past the parking garage.

Visitor Parking
Visitor parking is available in the Parking Garage A adjacent to the Metro IV Building. Enter at the visitor parking entrance on Toledo Road. Current rates are $2.00 for the first hour, $1.00 for subsequent hours, with a maximum of $7.00 per day.

Security System
The National Center for Health Statistics occupies the entire Metro IV Building. All visitors must proceed through a security check, sign in with the security guard personnel, and obtain a visitors pass. The security personnel will contact your NCHS point of contact who will escort you to your final destination.

Hotel Information

The Inn and Conference Center (managed by Marriott Hotels)
University of Maryland University College
3501 University Blvd E
Adelphi, MD
1 mile north of NCHS and very nice!
(301) 985-7300
Fax (301) 985-7517
www.idcide.com/hotels/md/marriott-inn-conference-center-adelphi.htm

Courtyard Greenbelt
6301 Golden Triangle Drive
Greenbelt, MD 20770
(301) 441-3311
Fax (301) 441-4978

Residence Inn Greenbelt
6320 Golden Triangle Drive
Greenbelt, MD 20770
(301) 982-1600
Fax (301) 982-6494

Courtyard New Carrollton Landover
8330 Corporate Drive
Landover, MD 20785
(301) 577-3373
Fax (301) 577-1780

Courtyard Silver Spring - Downtown
8506 Fenton Street
Silver Spring, MD 20910
(301) 589-4899
Fax (301) 589-4898

Marriott Courtyard Silver Spring
12521 Prosperity Drive
Silver Spring, MD 20904
(800) 228-9290

Residence Inn Marriott
12000 Plum Orchard Drive
Silver Spring, MD 20904
(301) 572-2322

Fairfield Inn Capital Beltway Marriott
4050 Powder Mill Rd
Beltsville, MD 20705
(800) 228-9290

Sheraton College Park
4095 Powder Mill Rd
Beltsville, MD 20705
(888) 625-5144

Hotels in Maryland (College park) for $50-$99 and more/night
http://maryland-hotels-x.com/50-99.99.html

Hotels in College Park, MD
http://maryland.hotelsonline.bz/collegepark.html

Marriott Hotel in Greenbelt, MD
http://www.idcide.com/hotels/md/marriott-greenbelt.htm

Comfort Inn in College Park, MD
http://www.idcide.com/hotels/md/comfort-inn-college-park.htm

Abstracts
Mathematical Models for Data Confidentiality Problems
Lawrence H. Cox, Ph.D.
National Center for Health Statistics
[email protected]

I will summarize mathematical models for data rounding, data perturbation, complementary cell suppression, and disclosure audit in two-way tables. A related method, quality-preserving controlled tabular adjustment, will not be discussed due to limitations of time. I will illustrate how these problems may be efficiently solved (to optimality) for two-way tables and related classes of tables, but are extremely difficult to impossible to solve for other classes.

Click here for the slides of this talk
A Stochastic Process Approach to the Analysis of Swapping for Categorical Variables
Lisa Denogean
Statistical and Applied Mathematical Sciences Institute
[email protected]

Data swapping can be used by government agencies to protect the confidentiality of publicly released data files. We study the stochastic process generated by data swapping applied to a data file of categorical variables. The purpose is to understand the effect of swapping and to help the original data owners to determine which variables to swap and how much to swap. We discuss various utility measures and introduce the idea of measuring distance from the limit rather than from the original file. In addition, we introduce a new type of swapping that we propose is superior to current methods.

Click here for the slides of this talk
Effects of Rounding on Data Quality
Jay J. Kim
National Center for Health Statistics
[email protected]

Integer data such as frequency counts may be rounded to an integer base to several purposes including disclosure limitation. Similarly, it is sometimes necessary to round noninteger data to integer data (viz., base 1 rounding) for statistical purposes, e.g., rounding expected sample counts (noninteger) to actual sample counts (integer). We evaluate the effects of four methods of rounding data on data quality and utility in two ways: (1) bias and variance (increase in total mean square error) and (2) effects on the underlying distribution of the data. The four rounding rules are conventional rounding, modified conventional rounding, zero-restricted 50/50 rounding, and unbiased rounding.

Click here for the slides of this talk

Combinations of SDC methods for continuous microdata
Anna Oganian, NISS

a

Click here for the slides of this talk
Algebraic Statistics for Statistical Disclosure Limitation in Contingency Tables
Aleksandra B. Slavkovic
Penn State
[email protected]

Categorical data collected by federal agencies and non-government survey organizations are often summarized in tabular form. The release of partial information is of public utility and typically involves reporting collections of marginal and conditional tables. Given this information, tools from algebraic geometry can be used to characterize discrete distributions for contingency tables, and to determine a disclosure risk. Algebraic statistics exploits the use of polynomial algebra and algebraic geometry for statistical inference. We demonstrate how the tools from algebraic geometry are used to represent the tables of counts and describe the locus (T) of all possible tables under the given constraints. We discuss some practical implication of using algebraic statistics for data privacy and confidentiality problems.

Click here for the slides of this talk

Secure Statistics Software for Horizontally Partitioned Data
Francisco Vera
National Institute of Statistical Sciences
Statistical and Applied Mathematical Sciences Institute
[email protected]

There are several methods for secure computations of statistics on distributed data. At the National Institute of Statistical Sciences a software to do some of these computations is under development. A snap shot of some of its current and future capabilities is given in this talk.

Click here for the slides of this talk
New measures of data utility
Mi-Ja Woo, NISS
[email protected]

When data is released to the public, it is important to find the data alteration method with high confidentiality that provides satisfactory data quality. I will focus on developing methods of measuring data quality when the distribution of data is unknown. We treat the data utility as a problem of evaluating similarities of original data structure to masked data structure. The data utilities we present here are rooted in the cumulative distribute function, clustering and propensity score approaches. When the distribution is departed from normal, simulations for a wide variety of data structures show how these measures can be used for evaluating disclosure limitation procedures.

Click here for the slides of this talk