Chapter 12. Known Issues

Table of Contents

12.1. General Issues
/tmp on worker nodes
PoD on AFS
WARNING: File /afs/.../pod-worker is not readable by condor
It seems I run always X slaves, but I requested Y.
gLite environment issue at CERN's LSF
12.2. Condor Issues
Condor and AFS
12.3. Grid Issues
ClassAds and Namespace
GLOBUS Libs Relocation
GridSite headers missing
globus_config.h is missing

12.1. General Issues

/tmp on worker nodes

The /tmp directory on remote workers must be open for r/w. PROOF and ROOT writes there. I have redirected all possible temporary files to PoD working directory, but there are still some files, which ROOT/PROOF writes to the /tmp, it includes sockets files of proof/xrootd.

PoD on AFS

Since AFS doesn't support pipes you need to change the PoD server working directory in PoD user defaults configuration, so that a new directory will not reside on AFS anymore. Something like that should work:

[server]
#
# PoD working directory
#
work_dir=/tmp/manafov/

WARNING: File /afs/.../pod-worker is not readable by condor

See the section called “Condor and AFS”

It seems I run always X slaves, but I requested Y.

PoD setups workers on the remote nodes and it makes PROOF master to think (only when PoD packet-forwarding connection is used), that all of his workers are on the localhost. Actually PoD hides remote PROOF workers from the PROOF server and acts as a "proxy" between them. And since default value for PROOF_MaxSlavesPerNode is 2, therefore only 2 slaves get packages. Since all slaves (for PROOF server) are on the localhost, the other Y-2 workers won't get packages.

See for more information PROOF Wiki:

PROOF_MaxSlavesPerNode
Type: int
Description: Parameter for the packetizers. Limit the number of slaves accessing data on any single node.
Default Value:
In TPacketizer the default value is 4.
In TPacketizerAdaptive and TPacketizerProgressive it is 2.
[Note]Note

From other source of information, it looks like the default number of workers reading remotely from one file node (worker machine) is not "2", but a number of CPU cores of the master node.

In order to resolve this issue, you need to change one variable of your PROOF session (50 is only an example):

proof->SetParameter( "PROOF_MaxSlavesPerNode", (Long_t)100 );

Hopefully in the future, this will be possible to do through XROOTD configuration file and PoD will manage it for you automatically.

gLite environment issue at CERN's LSF

If PoD doens't work for you out of the box at CERN on LSF, PoD jobs fail to start PoD workers, then most probably you are facing so called "gLite environment" issue. Check you the logs from PoD jobs and if you see something like this in you std_XXX.out of the job, then most probably xproofd will fail to start:

LD_LIBRARY_PATH=/tmp/PoD_cgmwU22625:/opt/d-cache/dcap/lib:/opt/d-cache/dcap/lib64:/opt/glite/lib:/opt/glite/lib64:/opt/globus/lib:\
/opt/lcg/lib:/opt/lcg/lib64:/usr/lib64:/afs/cern.ch/user/m/mbellomo/PoD/3.6/lib:/afs/cern.ch/sw/lcg/external/qt/4.4.2/x86_64-slc5-gcc43-opt/lib:\
/afs/cern.ch/sw/lcg/external/Boost/1.47.0_python2.6/x86_64-slc5-gcc43-opt//lib:/afs/cern.ch/sw/lcg/app/releases/ROOT/5.30.00/x86_64-slc5-gcc43-opt/root/lib:\
/afs/cern.ch/sw/lcg/contrib/gcc/4.3.5/x86_64-slc5-gcc34-opt/lib64:/afs/cern.ch/sw/lcg/contrib/mpfr/2.3.1/x86_64-slc5-gcc34-opt/lib:\
/afs/cern.ch/sw/lcg/contrib/gmp/4.2.2/x86_64-slc5-gcc34-opt/lib:/opt/classads/lib64/:/opt/c-ares/lib/

Note "glite" in the paths.

Just as a workaround, you can try to use Section 5.2, “User's environment on workers” in this file you need just two lines

#! /usr/bin/env bash
export LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/qt/4.4.2/x86_64-slc5-gcc43-opt/lib:\
/afs/cern.ch/sw/lcg/external/Boost/1.44.0_python2.6/x86_64-slc5-gcc43-opt//lib:/afs/cern.ch/sw/lcg/app/releases/ROOT/5.30.00/x86_64-slc5-gcc43-opt/root/lib:\
/afs/cern.ch/sw/lcg/contrib/gcc/4.3.5/x86_64-slc5-gcc34-opt/lib64:/afs/cern.ch/sw/lcg/contrib/mpfr/2.3.1/x86_64-slc5-gcc34-opt/lib:\
/afs/cern.ch/sw/lcg/contrib/gmp/4.2.2/x86_64-slc5-gcc34-opt/lib:/afs/cern.ch/alice/library/afs_volumes/vol12/geant3/lib/tgt_linuxx8664gcc:\
/afs/cern.ch/alice/library/afs_volumes/vol12/AliRoot/lib/tgt_linuxx8664gcc

or any sutable LD_LIBRARY_PATH you like, just without gLite libs.

Another solution is to check what inserts glite environemnt to your LSF jobs and get rid of it.

12.2. Condor Issues

Condor and AFS

If your home is on AFS, than you need to give permissions to Condor to access some of your PoD directories. Namely, Condor needs to have full access to the following folders: $HOME/.PoD/wrk and $HOME/.PoD/log. The last path is the default path for PoD logs. If you changed this path in PoD user defaults settings and a new log directory is also on AFS, than you need to open it accordingly.

You can do that by issuing the following commands:

fs setacl -dir $HOME/.PoD/wrk -acl system:anyuser rlidwk
fs setacl -dir $HOME/.PoD/log -acl system:anyuser rlidwk

12.3. Grid Issues

ClassAds and Namespace

One may want to compile CLASSADS with namespace support, because gLite UI contains CLASSADS which compiled without support of namespaces, though some of gLite API libraries (WMSUI for example) require classads with namespace support. This issue will prevent GAW to be build properly.

Download classads-0.9.9 from here.

tar -xzvf classads-0.9.9.tar.gz
cd classads-0.9.9
./configure --enable-namespace
make
make install

Be advised that in some Linux distributions there is the ClassAds package. For example Fedora 9:

yum install classads
yum install classads-devel

GLOBUS Libs Relocation

If you have gLiteUI installed from relocatable tarball, then you may face this gLite bug by having the following (or similar ones) error messages while compiling GAW library:

grep: /opt/globus/lib/libglobus_ftp_control_gcc32dbg.la: No such file or directory
/bin/sed: can't read /opt/globus/lib/libglobus_ftp_control_gcc32dbg.la: No such file or directory
libtool: link: `/opt/globus/lib/libglobus_ftp_control_gcc32dbg.la' is not a valid libtool archive

One of the solutions would be to just copy Globus libs to /opt/:

cd $GLOBUS_LOCATION
mkdir -p /opt/globus
cp -rv lib /opt/globus/

GridSite headers missing

If you have gLiteUI installed from relocatable tarball, then you may face this gLite bug. This issue will prevent GAW to be build properly.

One of the solutions would be to just get these headers from somewhere.

globus_config.h is missing

Since long time gLite UI_TAR installation (I suspect gLite UI as well) is missing "globus_config.h" (see CERN Savannah - gLite Bug #31180). gLite API is referencing to this file, but is not providing it. Users therefore should find it some where, to let GAW to use gLite API.