August 2006

Contents:

1. Introduction

2. Compiling

3. Input

4. Output

5. Subroutines

1. Introduction

SIS_CP1.c, SIS_CP2.c and Volume.c implement the sequential importance

sampling (SIS) algorithms for sampling two-way zero-one or contingency

tables with fixed marginal sums and a given set of structural zeros. The

samples generated by these algorithms can be used to make conditional

inference on zero-one and contingency tables, e.g., computing p-values of

exact tests. More details of these algorithms can be found in the paper

“Conditional Inference on Tables with Structural Zeros.”

SIS_CP1.c and SIS_CP2.c are for zero-one tables with fixed marginal sums

and structural zeros. SIS_CP2.c requires that there is at most one

structural zero in each row and column, and it always generates valid

tables. SIS_CP1.c works for any patterns of structural zeros, but it may

generate some invalid tables. Volume.c is for contingency tables with

fixed marginal sums and structural zeros. It requires that there is at

most one structural zero in each column, and it always generates valid

tables.

2. Compiling

The algorithms are written in the C language. They estimate the p-values

of exact tests as well as the total number of tables satisfying the

constraints. We can use the command

gcc -o executive_file_name c_code_file_name -lm

to compile the code and then run the executive file. For example,

SIS_CP1.c can be compiled by using the command

gcc -o SIS_CP1.exe SIS_CP1.c -lm

3. Input

When the code is executed, the users will be prompted to input the number

of rows and columns in the table, how many estimates the users need

(denoted by M), and how many samples each estimate should be based on

(denoted by N). Then the user will be asked to enter the data file name.

Examples of date files are given at this web page: the file finch.dat is

for SIS_CP1.c, manager.dat is for SIS_CP2.c, and monkey.dat is for

Volume.c. The file monkey.dat contains not only the observed table (as in

finch.dat and manager.dat), but also a matrix to denote the positions of

structural zeros and the maximum likelihood estimates of the expected

number of observations in each cell.

Different problems have different patterns of structural zeros and

different test statistics. It is not possible to include all cases in the

code. SIS_CP1.c is written to implement the test for the ecological

example in the paper. SIS_CP2.c is written to implement the test of

mutuality in social networks. It assumes that the structural zeros are on

the diagonal of a square matrix. Volume.c is written to implement the

conditional volume test for contingency tables. The code can be easily

modified to handle other kinds of structural zeros and other test

statistics.

4. Output

There is one output file result.txt. The results are given in matrix

format. Each row has three (for SIS_CP2.c and Volume.c) or four (for

SIS_CP1.c) numbers. The first number is the estimated total number of

tables. The second number is the estimated p-value. The third number is

the estimated cv^2 (square of the coefficient of variation) of the

importance weights. The fourth number (for SIS_CP1.c only) is the total

number of bad tables generated by the program. The estimates in each row

are based on N samples. The estimates are repeated M times, displayed in M

rows. Here M and N are the parameters that the user specified. Based on

the M independent estimates of the p-value, the final estimate of the

p-value and its standard error can be computed.

5. Subroutines

The main function of the code contains seven subroutines:

(1). ReadInfo: This subroutine asks the user to input a few parameter

values related to the table and the algorithm.

(2). ReadTable: This subroutine reads in the observed table.

(3). The next subroutine computes the test statistic value for the

observed table (sSquare0Bar for SIS_CP1, reciprocal0 for SIS_CP2, and

chiSquare0 for Volume.c).

(4). InitialSetting: This subroutine sets up the initial values before

generating tables with given constraints.

(5). GenerateATable: This subroutine generates a table with given

constraints. The importance weight is stored in “currentSample”. This is

the most important subroutine and it implements the SIS algorithms.

(6). UpdatePValue: This subroutine updates the p-value estimation after a

new table is generated.

(7). PrintOutput: This subroutine prints the results to a file called

“result.txt”.

The users can modify the subroutines ReadInfo, ReadTable and PrintOutput

if they prefer a different input and output format. The third subroutines

for computing test statistic values and the subroutine UpdatePValue can be

modified if a different test statistic is used and the p-value is computed

a different way. In the major subroutine GenerateATable, two small

functions RandomReal and RandomInteger (defined at the beginning of the

code) are called to generate a random real number or a random integer

between a and b. Users can replace these two functions by other random

number generators.