dkrotx-prg: October 2011

Thursday, October 27, 2011

rsync & rsync via SSH

As you know, rsync may work with it's native rsyncd server and via SSH. If source or destination contains '::' this addresses rsync' mountpoint. Otherwise it works via SSH using cryptographic channel.
Anyway, we use both of them. And there are many cases then you have expressions like:

#!/usr/bin/env bash

URL1=xxx.yyyy.com::point/file1.txt # native rsync
URL2=xxx.yyyy.com:some_folder/file2.txt # rsync-ssh
#... 
rsync --contimeout 10 --timeout 40 $URL1 file1.txt
rsync -e ssh "-o BatchMode=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=40" $URL2 dir/

# same things many times

As you can see, we have to use different syntax for native- and ssh-rsync: --contimeout is available only for native case. Really bad news, especially if I want to omit 'rsync' and it's options each time. Should I know what type of URI I'll got in specific call?!. No! It's easy to rewrite it as follows:

SSH_OPTS="-o BatchMode=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=40"

# select rsync or rsync-ssh syntax automatically
function RSYNC
{
  local looks_like_rsync=

  for a in "$@"
  do
    if [[ $a =~ :: ]]; then
      looks_like_rsync=yes
      break
    fi
  done

  if [[ -n $looks_like_rsync ]]; then
    rsync --contimeout=10 --timeout=40 -tv "$@"
    return $?
  fi

  rsync -tv -e "ssh $SSH_OPTS" "$@"
}

Thursday, October 20, 2011

population count (POPCNT)

I'm wondering, some people don't know about primitive bits technique. For example, trying to count number of '1' bits in int may be done as follows:

int get_nbits(int x) {
  int cnt = 0;
  while(x) {
    cnt += x & 1;
    x >>= 1;
  }
  return cnt;
}

It's rough and not efficient. Any ways to make it faster? Couple of them (and, yea - give more of 'em in comments).

boost 1)

Look here: we have a number which is a power of 2. Such numbers has a great attribute: if you perform '&' with number-1, you'll always get a zero. This is because you always have "ones" in place you get zero in power-of-two.

Look at this: number 16 and 16-1=15:

  10000
& 01111
-------
      0

So what does it means? You always eliminate the lower 'one' bit. And example above may be easily rewritten as:

int count_bits(int x) {
  int cnt = 0;
  while (x) {
    x &= (x-1);
    cnt++; 
  }
  return cnt;
}

So, now we will perform no more than "number-of-ones" loops. Ok, but still doesn't efficient.

boost 2)

Why we should count a bits using N loops; we have to precompute 'em instead! So, use precomputed table for bytes, we may rewrite our example as follows:

static unsigned char nbits[256] = {
  /*   0 -  15 */ 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 
  /*  16 -  31 */ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
  /*  32 -  47 */ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
  /*  48 -  63 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /*  64 -  79 */ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
  /*  80 -  95 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /*  96 - 111 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /* 112 - 127 */ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
  /* 128 - 143 */ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 
  /* 144 - 159 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /* 160 - 175 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /* 176 - 191 */ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
  /* 192 - 207 */ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 
  /* 208 - 223 */ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
  /* 224 - 239 */ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 
  /* 240 - 255 */ 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
};

int count_bits(unsigned x)
{
  return nbits[x & 0xFFU] + nbits[(x >> 8) & 0xFFU] + nbits[(x >> 16) & 0xFFU] + nbits[x >> 24]
}

boost 3) Using POPCNT from Intel SSE4.2 command set

You may find brief description and links here.

boost 3.2

GCC has set of built-in commands which expands to CPU-instructions if present, or use software (but efficient) workarounds. As final example, I prefer to use this code:

//
// Uses SSE4 'POPCNT' instruction if present, or gcc-stub like 'popcount'
//
template int popcnt_modern(T);
template<> int popcnt_modern(unsigned x) { return __builtin_popcount(x); }
template<> int popcnt_modern(unsigned long x) { return __builtin_popcountl(x); }
template<> int popcnt_modern(unsigned long long x) { return __builtin_popcountll(x); }

Although It seems what gcc generates inefficient code unless your CPU provide SSE4.2. Hard to provide numbers here, but trust - code with byte-table is slightly faster.

Tuesday, October 18, 2011

eval is your friend

I like to use expressions like {1..100} in bash - it makes tedious FORs much more easier. But you can't express this using variables. Even via readonly variables.So, you can not easily write this:

readonly START_ID=200
readonly END_ID=250

for i in {{1..100},{$START_ID..$END_ID}}; do
   # do something using $i
done

But (thank God) here is pretty workaround: eval. So, construction above may be rewritten as

for i in $( eval echo {{1..100},{$START_ID..$END_ID}} ); do
   # ...
done

dkrotx-prg