August 2006

Contents:
1. Introduction
2. Compiling
3. Input
4. Output
5. Subroutines

1. Introduction

SIS_CP1.c, SIS_CP2.c and Volume.c implement the sequential importance
sampling (SIS) algorithms for sampling two-way zero-one or contingency
tables with fixed marginal sums and a given set of structural zeros. The
samples generated by these algorithms can be used to make conditional
inference on zero-one and contingency tables, e.g., computing p-values of
exact tests. More details of these algorithms can be found in the paper
“Conditional Inference on Tables with Structural Zeros.”

SIS_CP1.c and SIS_CP2.c are for zero-one tables with fixed marginal sums
and structural zeros. SIS_CP2.c requires that there is at most one
structural zero in each row and column, and it always generates valid
tables. SIS_CP1.c works for any patterns of structural zeros, but it may
generate some invalid tables. Volume.c is for contingency tables with
fixed marginal sums and structural zeros. It requires that there is at
most one structural zero in each column, and it always generates valid
tables.

2. Compiling

The algorithms are written in the C language. They estimate the p-values
of exact tests as well as the total number of tables satisfying the
constraints. We can use the command

gcc -o executive_file_name c_code_file_name -lm

to compile the code and then run the executive file. For example,
SIS_CP1.c can be compiled by using the command

gcc -o SIS_CP1.exe SIS_CP1.c -lm

3. Input

When the code is executed, the users will be prompted to input the number
of rows and columns in the table, how many estimates the users need
(denoted by M), and how many samples each estimate should be based on
(denoted by N). Then the user will be asked to enter the data file name.
Examples of date files are given at this web page: the file finch.dat is
for SIS_CP1.c, manager.dat is for SIS_CP2.c, and monkey.dat is for
Volume.c. The file monkey.dat contains not only the observed table (as in
finch.dat and manager.dat), but also a matrix to denote the positions of
structural zeros and the maximum likelihood estimates of the expected
number of observations in each cell.

Different problems have different patterns of structural zeros and
different test statistics. It is not possible to include all cases in the
code. SIS_CP1.c is written to implement the test for the ecological
example in the paper. SIS_CP2.c is written to implement the test of
mutuality in social networks. It assumes that the structural zeros are on
the diagonal of a square matrix. Volume.c is written to implement the
conditional volume test for contingency tables. The code can be easily
modified to handle other kinds of structural zeros and other test
statistics.

4. Output

There is one output file result.txt. The results are given in matrix
format. Each row has three (for SIS_CP2.c and Volume.c) or four (for
SIS_CP1.c) numbers. The first number is the estimated total number of
tables. The second number is the estimated p-value. The third number is
the estimated cv^2 (square of the coefficient of variation) of the
importance weights. The fourth number (for SIS_CP1.c only) is the total
number of bad tables generated by the program. The estimates in each row
are based on N samples. The estimates are repeated M times, displayed in M
rows. Here M and N are the parameters that the user specified. Based on
the M independent estimates of the p-value, the final estimate of the
p-value and its standard error can be computed.

5. Subroutines

The main function of the code contains seven subroutines:

(1). ReadInfo: This subroutine asks the user to input a few parameter
values related to the table and the algorithm.

(2). ReadTable: This subroutine reads in the observed table.

(3). The next subroutine computes the test statistic value for the
observed table (sSquare0Bar for SIS_CP1, reciprocal0 for SIS_CP2, and
chiSquare0 for Volume.c).

(4). InitialSetting: This subroutine sets up the initial values before
generating tables with given constraints.

(5). GenerateATable: This subroutine generates a table with given
constraints. The importance weight is stored in “currentSample”. This is
the most important subroutine and it implements the SIS algorithms.

(6). UpdatePValue: This subroutine updates the p-value estimation after a
new table is generated.

(7). PrintOutput: This subroutine prints the results to a file called
“result.txt”.

The users can modify the subroutines ReadInfo, ReadTable and PrintOutput
if they prefer a different input and output format. The third subroutines
for computing test statistic values and the subroutine UpdatePValue can be
modified if a different test statistic is used and the p-value is computed
a different way. In the major subroutine GenerateATable, two small
functions RandomReal and RandomInteger (defined at the beginning of the
code) are called to generate a random real number or a random integer
between a and b. Users can replace these two functions by other random
number generators.