/* 
* File:    INSTALL.txt
* CVS:     $Id: INSTALL.txt,v 1.37.2.2 2009/12/05 17:52:34 terpstra Exp $
* Author:  Kevin London
*          london@cs.utk.edu
* Mods:    Dan Terpstra
*          terpstra@cs.utk.edu
* Mods:    Philip Mucci
*          mucci@cs.utk.edu
* Mods:    Harald Servat
*          redcrash@gmail.com
* Mods:    <your name here>
*          <your email address>
*/

*****************************************************************************
HOW TO INSTALL PAPI ONTO YOUR SYSTEM
*****************************************************************************

On some of the systems that PAPI supports, you can install PAPI right 
out of the box without any additional setup. Others require drivers or 
patches to be installed first.

The general installation steps are below, but first find your particular 
Operating System's section for any additional steps that may be necessary.
NOTE: the configure and make files are located in the papi/src directory.

General Installation

1.	% ./configure
	% make

2.	Check for errors. 

	a) Run a simple test case: (This will run ctests/zero)

	% make test

	If you get good counts, you can optionally run all the test programs
	with the included test harness. This will run the tests in quiet mode, 
	which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the
	functionality being tested is not supported by that platform.

	% make fulltest (This will run ./run_tests.sh)

	To run the tests in verbose mode:

	% ./run_tests.sh -v

3.	Create a PAPI binary distribution or install PAPI directly.

	a) To install PAPI libraries and header files from the build tree:

	% make install

	b) To install PAPI test programs from the build tree:

	% make install-tests

	c) To create a binary kit, papi-<arch>.tgz:

	% make dist

*****************************************************************************
MORE ABOUT CONFIGURE OPTIONS
*****************************************************************************

An extensive array of options is available from the configure command-line. 
These options select default directories, 32- or 64-bit library modes, driver 
versions, compiler settings and more. For complete details on the command-line 
options:
	% ./configure --help

*****************************************************************************
Operating System Specific Installation Steps (In Alphabetical Order by OS)
*****************************************************************************

AIX - IBM POWER4, POWER5, POWER6
*****************************************************************************
PAPI is supported on AIX 5.x for POWER4,5 and 6.
Use ./configure to select the desired make options for your system, 
specifying the --with_bitmode=32 or --with-bitmode=64 to select wordlength.
32 bits is the default.

1.	On AIX 5.x, the bos.pmapi is a product level fileset (part of the OS).
	However, it is not installed by default. Consult your sysadmin to 
	make sure it is installed. 
2.	Follow the general instructions for installing PAPI.

WARNING: PAPI requires XLC version 6 or greater.
Your version can be determined by running 'lslpp -a -l | grep -i xlc'.

BG/L 
*****************************************************************************
BG/L is a cross-compiled environment. The machine on which PAPI is compiled
is not the machine on which PAPI runs. To compile PAPI on BG/L, specify the 
BG/L environment as shown below:

	% ./configure --with-OS=bgl
	% make

The testing targets in the make file will not work in the BG/L environment.
Since BG/L supports multiple queueing systems, you must manually execute
programs in the ctests and ftests directories to check for successful
library creation. You can also manually edit the run_tests.sh script to
automate testing for your installation. 

WARNING: ./configure might fail if the cross compiler is not in your path.
	 In that is the case, just add it to your path 
	 and everything should work.

BG/P 
*****************************************************************************
BG/P is not supported in PAPI 3.7.0. It requires the PAPI 3.9.0 or greater 
release and a patch available from IBM.

Catamount - Cray XT3/XT4 Opteron
*****************************************************************************
The Cray XT3/4 is a cross-compiled environment. In this case, configure
can automatically detect the catamount operating system and make appropriate
default assignments. Alternatively, you can specify the OS to configure.
Please note that although Catamount contains the necessary PerfCtr patches
to access the hardware counters, a customized copy of the Perfctr 2.5.4 source
is necessary to compile and link PAPI. Contact Cray for a copy of these sources.

Before running configure to create the make file that supports a Catamount
build of PAPI and perfctr-2.5.4, execute the following module commands:

% module purge
% module load Base-opts PrgEnv-gnu
% module swap gcc gcc-catamount/3.3
% module swap xt-mpt xt-mpt/$XTOS_VERSION
% module unload xtpe-target-cnl
% module load xtpe-target-catamount

Execute configure

% configure --with-perfmon=2.2 --prefix=<install-dir> --with-OS=catamount

The testing targets in the make file will not work in the Catamount 
environment. It is necessary to log into an interactive session and run the 
tests manually through the job submission system. For example, instead of:
	% make test
use:
	% yod -sz 1 ctests/zero
and instead of:
	% make fulltest
useg
    % ./run_cat_tests.sh

PLATFORM NOTES:
PAPI on Catamount currently supports single event overflow. It does not 
support multiple event overflow or profiling. Many of the overflow and
profile tests in the PAPI distribution WILL SEGFAULT in this release.

CLE - Cray XT3/XT4/XT5 Opteron
*****************************************************************************
The Cray XT3/4/5 is a cross-compiled environment. You must specify the
perfmon version to configure as shown below.

Before running configure to create the makefile that supports a Cray XT CLE
build of PAPI, execute the following module commands:
    % module purge
    % module load Base-opts
    % module load pgi
Note: do not load the programming environment module (e.g. PrgEnv-pgi) 
but the compiler module (e.g. pgi) as shown above.

Check CLE compute nodes for the version of perfmon2 that it supports:
    % aprun -b -a xt cat /sys/kernel/perfmon/version

and use this version when configuring PAPI:
    % configure --with-perfmon=2.3 --prefix=<install-dir>

Example ORNL's Jaguar system: 
To build PAPI for the compute nodes on ORNL's Jaguar system, you should use
the Cray wrapper 'cc'. The wrappers will call the appropriate compiler which
will use the appropriate header files and link against the appropriate
libraries.
    % configure --with-perfmon=2.3 --prefix=<install-dir> CC=cc

The testing targets in the makefile will not work in the XT CLE environment.
It is necessary to log into an interactive session and run the tests
manually through the job submission system. For example, instead of:
	% make test
use:
	% aprun -n1 ctests/zero
and instead of:
	% make fulltest
use:
	% ./run_cat_tests.sh
after substituting "aprun -n1" for "yod -sz 1" in run_cat_tests.sh.

CLE - Cray X2
*****************************************************************************
The Cray X2 is a cross-compiled environment. You must specify the OS and
perfmon version to configure as shown below.

Before running configure to create the make file that supports a Cray X2 CLE
build of PAPI, execute the following module commands:

% module purge
% module load PrgEnv-x2

Check CLE compute nodes for the version of perfmon2 that it supports:

% aprun -b -a x2 cat /sys/kernel/perfmon/version

and use this version when configuring PAPI:

% configure --with-perfmon=2.3 --prefix=<install-dir> --with-OS=CLE

The testing targets in the make file will not work in the X2 CLE environment.
It is necessary to log into an interactive session and run the tests
manually through the job submission system. For example, instead of:
	% make test
use:
	% aprun -n1 ctests/zero
and instead of:
	% make fulltest
use:
	% ./run_cat_tests.sh
after substituting "aprun -n1" for "yod -sz 1" in run_cat_tests.sh.

FreeBSD - i386 & amd64
*****************************************************************************
PAPI requires FreeBSD 6 or higher to work.

Kernel needs some modifications to provide PAPI access to the performance 
monitoring counters. Simply, add "options HWPMC_HOOKS" and "device hwpmc" in
the kernel configuration file. For i386 systems, add also "device apic".
(You can obtain more information in hwpmc(4), see NOTE 1 to check the
supported HW)

After this step, just recompile the kernel and boot it.

FreeBSD 7 (or greater) does not ship with a fortran compiler. To compile
fortan tests you will need to install a fortran compiler first (e.g.
installing it from /usr/ports/lang/gcc42), and setup the F77 environment
variable with the compiler you want to use (e.g. gfortran42). 

Fortran compilers may issue errors due to "Integer too big for its kind *".
Add to FFLAGS environment variable a compiler option to use int*8 by default
(in gfortran42 it is -fdefault-integer-8).

Follow the "General Installation" steps.

NOTE 1: 
--
HWPMC driver supports the following processors: Intel Pentium 2,
Intel Pentium Pro, Intel Pentium 3, Intel Pentium M, Intel Celeron,
Intel Pentium 4, AMD K7 (AMD Athlon) and AMD K8 (AMD Athlon64 / Opteron).

FreeBSD 8 also adds support for Core/Core2/Atom processors. There is also a
patch for FreeBSD 7/7.1 in http://wiki.freebsd.org/PmcTools

IRIX - MIPS
*****************************************************************************
No additional steps are required to install PAPI on IRIX. Follow the 
general installation guide, and everything should work.

Linux - IBM Cell
*****************************************************************************
PAPI on Cell requires the perfmon2 patch and libpfm library available from
sourceforge <http://sourceforge.net/projects/perfmon2>. If the kernel is
properly patched, PAPI should configure and build as described under General
Installation, above. Support for Cell is experimental and limited to basic
counting and native event support, with a limited number of PRESET events. 

Linux - Itanium I, II, Montecito, Montvale
*****************************************************************************
PAPI on Itanium Linux links to the perfmon library. The library version and 
the Itanium version are automatically determined by configure.
If you wish to override the defaults, a number of pfm options are available
to configure. Use:
	% ./configure --help
to learn more about these options.

Follow the general installation instructions to complete your installation.

PERFMON2:
The Itanium Linux kernel comes preconfigured with the perfmon driver. If you
choose, you can patch your kernel with the newer perfmon2 driver, found at
http://sourceforge.net/projects/perfmon2. The standard installation procdeure
applies in this case; configure and make will recognize the proper driver.
Since this software is still evolving, not all features are guaranteed to work.

PLATFORM NOTES:
The earprofile test fails under perfmon for Itanium I and II. It has been
reconfigured to work on the perfmon2 interface.

Linux - MIPS/MIPS64
****************************************************************************

If the kernel is patched with Perfmon2, you're all set to go. 

Linux - PPC64 (POWER4, 5, 6, 7 and PowerPC970)
****************************************************************************
Linux/PPC64 kernels prior to 2.6.31 must be patched with either Perfctr or
Perfmon and recompiled to support access to hardware counters. Kernels 
including 2.6.31 and following include built-in support for the perf_counters
or PCL (Performance Counters for Linux) interface. PAPI supports all three.

PCL:
If you are running Linux kernel 2.6.31 or later, your system has built-in 
support for hardware counters through the perf_counters interface. No kernel 
patch is required. The current version of configure for PAPI does not auto-
matically detect the presence of perf_counters, so you need to specify
	% ./configure --with-pcl=yes
on the configure command line. This interface has not been fully tested, but
has been verified to work on basic counting operations. It should be considered
a technical pre-release.

PERFCTR:
The required patches and complete installation instructions for Perfctr
are provided in the papi/src/perfctr-2.7.x directory. PPC64 is the ONLY 
platform that REQUIRES use of PerfCtr 2.7.x.

*- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -*

WARNING: You should always use a PerfCtr distribution that has been distributed
with a version of PAPI or your build will fail. The reason for this is that
PAPI builds a shared library of the Perfctr runtime, on which libpapi.so
depends. PAPI also depends on the .a file, which it decomposes into component
objects files and includes in the libpapi.a file for convenience. If you
install a new perfctr, even a shared library, YOU MUST REBUILD PAPI to get
a proper, working libpapi.a.

There are several options in configure to allow you to specify your perfctr 
version and location. Use:
	% ./configure --help
to learn more about these options.

Follow the general installation instructions to complete your installation.

PERFMON2:
Make sure that your Linux/PPC64 kernel is patched with the newest perfmon2
driver found at http://sourceforge.net/projects/perfmon2. The standard 
installation procedure applies here.

For 32-bit builds using the internal libpfm-3.y library:
Follow the standard configure and make procedure in this documentation.

For 64-bit builds using the internal libpfm-3.y library:
First, configure with the following command:

	CFLAGS="-m64" FFLAGS="-m64" LDFLAGS="-m64" \
	./configure \
	--libdir=/usr/local/lib64

Next, build papi using the following command, and then follow the normal
installation procedure.

	make BITMODE=64

For 64-bit builds using an externally built 64-bit libpfm library:
First, build a 64-bit libpfm by issuing the following from the latest libpfm
project code available at: http://sourceforge.net/projects/perfmon2.

	make BITMODE=64 install

Next, configure with the following command, and then follow the normal
build and installation procedure.

	CFLAGS="-m64" FFLAGS="-m64" LDFLAGS="-m64" \
	./configure \
	--libdir=/usr/local/lib64 \
	--with-pfm-prefix=/usr/local \
	--with-pfm-libdir=/usr/local/lib64

Note: BITMODE is ignored by PAPI's Makefile, and is only used by
libpfm's Makefile.

Linux/x86 - Intel Pentium, Core, Core2, Atom, i7 and AMD Athlon, Opteron
*****************************************************************************
Linux/x86 kernels prior to 2.6.31 must be patched with either Perfctr or
Perfmon and recompiled to support access to hardware counters. Linux kernel
2.6.31 includes built-in support for the perf_counter interface, also called
PCL (Performance Counters for Linux). This interface was renamed to Perf_event
in kernel 2.6.32 and above. PAPI supports all of these interfaces.

PCL:
If you are running Linux kernel 2.6.31 or later, your system has built-in 
support for hardware counters through the perf_counter interface. No kernel 
patch is required. The current version of configure for PAPI attempts to 
automatically detect the presence of perf_counter, but if it fails, you may
need to specify
	% ./configure --with-pcl=yes
on the configure command line. Additionally, if the perf_counter.h file is 
not found in an expected location you may need to specify a path with:
	% ./configure  --with-pcl-incdir=<path>
NOTE: The PERFCTR patch (see below) can be applied to a 2.6.3x kernel with 
perf_counter enabled. In this case, configure will attempt to build PAPI 
for PERFCTR instead of PCL. If you want to build for PCL, follow the above
instructions to force a PCL build.

PERF_EVENT:
For Linux kernel 2.6.32 and higher, the name of the built-in interface to
performance counters was renamed from perf_counter to perf_event. In this
case, PAPI will automatically look for perf_event.h instead of perf_counter.h.
All other instructions shown above for perf_event still apply.

PERFMON2:
Make sure that your Linux/PPC64 kernel is patched with the newest perfmon2
driver found at http://sourceforge.net/projects/perfmon2. The standard 
installation procedure applies here.

PERFCTR:
The required perfctr patches and complete installation instructions are 
provided in the papi/src/perfctr-2.6.x directory. Please see the INSTALL file
in that directory.

Do not forget, you also need to build your kernel with APIC support in order
for hardware overflow to work. This is very important for accurate statistical
profiling ala gprof via the hardware counters.

So, when you configure your kernel to build with PERFCTR as above, make
sure you turn on APIC support in the "Processor type and features" section.

You can verify the APIC is working after rebooting with the new kernel
by running the 'perfex -i' command found in the perfctr/examples/perfex
directory.

PAPI on x86 assumes PerfCtr 2.6.x. You can change this with the 
'--with-perfctr=<5,6,7> option to ./configure. THIS IS NOT ADVISED.
PerfCtr version 2.5.4 is specific for Cray XT3, and PerfCtr 2.7.x is
supported only for PPC64.
NOTE: THE VERSIONS OF PERFCTR DO NOT CORRESPOND TO LINUX KERNEL VERSIONS.

*- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -*

WARNING: You should always use a PerfCtr distribution that has been distributed
with a version of PAPI or your build may fail. Newer versions with backward
compatibility may also work. PAPI builds a shared library of the Perfctr 
runtime, on which libpapi.so depends. PAPI also depends on the .a file, 
which it decomposes into component objects files and includes in the libpapi.a 
file for convenience. If you install a new PerfCtr, even a shared library, 
YOU MUST REBUILD PAPI to get a proper, working libpapi.a. 

There are several options in configure to allow you to specify your perfctr 
version and location. Use:
	% ./configure --help
to learn more about these options.

Follow the general installation instructions to complete your installation.PERFCT

*- IF PERFCTR IS INSTALLED BUT PAPI FAILS TO INITIALIZE -*

You may be running udev, which is not smart enough to know the permissions of 
dynamically created devices. To fix this, find your udev/devices directory, 
often /lib/udev/devices or /etc/udev/devices and perform the following actions:

 mknod perfctr c 10 182
 chmod 644 perfctr

On Ubuntu 6.06 (and probably other debian distros),  add a line to 
/etc/udev/rules.d/40-permissions.rules like this:

KERNEL=="perfctr", MODE="0666"

On SuSE, you may need to add something like the following to
/etc/udev/rules.d/50-udev-default.rules:
 (SuSE does not have the 40-permissions.rules file in it.]

# cpu devices
KERNEL=="cpu[0-9]*",            NAME="cpu/%n/cpuid"
KERNEL=="msr[0-9]*",            NAME="cpu/%n/msr"
KERNEL=="microcode",            NAME="cpu/microcode", MODE="0600"
KERNEL=="perfctr",              NAME="perfctr", MODE="0644"

These lines tell udev to always create the device file with the appropriate permissions.
Use 'perfex -i' from the perfctr distribution to test this fix.

Linux - SiCortex
*****************************************************************************

Assuming the default 64 bit build:

When compiling on the nodes:

     ./configure \
     --with-pfm-libdir=/usr/lib64 \
     --with-pfm-incdir=/usr/include \

When cross-compiling on the front end:

     If /etc/sicortex-release is < V2.3 and r81:

     ./configure \
     --with-pfm-libdir=/opt/sicortex/rootfs/default/usr/lib64 \
     --with-pfm-incdir=/opt/sicortex/rootfs/default/usr/include \
     --with-arch=mips64 --host=x86_64 --build=mips64 \
     --with-perfmon=2.4 --with-virtualtimer=clock_thread_cputime_id \
     --with-tls --with-ffsll --with-pfm-events=static \
     --with-walltimer=gettimeofday 

     If you have a machine that is more recent than V2.3 and R80, then change
     the last line to:

     --with-walltimer=cycle

Solaris 8 - Ultrasparc
*****************************************************************************
The only requirement for Solaris is that you must be running version 2.8 or 
newer.  As long as that requirement is met, no additional steps are required 
to install PAPI and you can follow the general installation guide.

Solaris 10 - UltraSPARC T2/Niagara 2
*****************************************************************************
PAPI supports the Niagara 2 on Solaris 10 as of PAPI 3.7.0. The substrate
offers support for common basic operations like adding/reading/etc and the
advanced features multiplexing (see below), overflow handling and profiling.
The implementation for Solaris 10 is based on libcpc 2, which offers access
to the underlying performance counters. Performance counters for the
UltraSPARC architecture are described in the UltraSPARC architecture manual
in general with detailed descriptions in the actual processor manual. In
case of this substrate the documentation for performance counters can be
found at:

 - http://www.opensparc.net/publications/specifications/

In order to install PAPI on this platform make sure the packages SUNWcpc and
SUNWcpcu are installed. For the compilation Sun Studio 12 was used while the
substrate has been developed. GNU GCC has not been tested and would require
to modify the makefiles Makefile.solaris-niagara2 (32 bit) and
Makefile.solaris-niagara2-64bit (64 bit).

The steps required for installation are as follows:

	./configure --with-bitmode=[32|64] --prefix=/is/optional
	
		If no --with-bitmode parameter is present a default of
		32 bit is assumed.

		If no --prefix is used, a default of /usr/local is assumed.

	make
	make install

If you want to link your application against your installation you should
make sure to include at least the following linker options:

	-lpapi -lcpc

PLEASE NOTE: This is the first revision of Niagara 2/libcpc 2/Solaris 10
support and needs further testing! Contributions, especially for the preset
definitions, would be very appreciated.

MULTIPLEXING: As the Niagara 2 offers no native event to count the cycles
elapsed, a "synthetic event" was created offering access to the cycle count.
This event is neither as accurate as the native events, nor it should be
used for anything else than the multiplexing mode, which needs the cycle
count in order to work. Therefore multiplexing and the preset PAPI_TOT_CYC
should be only used with caution. BEWARE OF WRONG COUNTER RESULTS!

HINT: In case of a data race the Instr_FGU_arithmetic might not count all 
events therefore there might be a difference between theory and actual 
results. The missing events might be caused by consistency or coherency 
protocols reverting the operations as they overlap. The behavior was observed 
with several threads accessing the same data set and performing the same 
floating-point instructions (OpenMP parallel region with data shared on heap, 
no synchronization or locking). 

You should encounter this problem only in faulty programs with shared memory 
parallization (compare single threaded vs. multi threaded event count) - a 
correct implementation should give you exact results. Other native events 
related to instructions (Instr_*) might be influenced by this behavior.

Windows XP/2000/Server 2003 - Intel Pentium III or AMD Athlon / Opteron
*****************************************************************************
PAPI 3.7 for Windows runs on most modern processors 
(Intel Pentium 3 or better) and most 32-bit versions of Windows 
(2000 or better).

PLEASE NOTE: Windows provides no way to do process level counts,
so a system with little background noise is recomended.
64-bit (x64 versions of Windows) is not supported at this time.

See win2k/README.txt for detailed build instructions.

PAPI 3.5 for Windows runs on Pentium III, Athlon and Opteron, for 
Windows 2000, Windows XP and Windows Server 2003. 

