PERL Q1: Consider the following program: #! /usr/local/bin/perl5.003 -w { my($string) = "brad3hello"; $string =~ /^[^\d]{2,4}<([^>]+)>\d?\1$/; if( defined($1) ) { print "$1\n"; } else { print "not found\n"; } } Explain what the regular expression is trying to match? A1: The regular expression is attempting to match between 2 and 4 non digit numbers and then anything between two brackets, an optional number and then the word between the angle brackets again. Thus the following inputs would produce these outputs : brad3bar = not found brad3foo = foo bradfoo = foo brd4foo = foo brd44foo = not found br<4foo>44foo = 4foo Q2: If you were writing a perl program as a prototype for a program you would like to eventually write in C (not C++), would this alter how you write the Perl prototype? In what ways? A2: To do this I would write the program in a strictly non object orientated style, making sure that I explicitly put in 'destructors' where necessary (by setting a variable to undef) even though Perl does not require them. I would attempt to make sure that types were as strict as possible and where they were converted (the number 2 to the string "2" for example) these would be explicitly documented. Using Perl idiosyncrasies and special features such as foreach loops and the unless statement would only provide complications when the porting from Perl to C. Things likes regular expressions and CPAN modules should be avoided unless there was a satisfactory equivalent available to C through a similar interface. Q3: What does 'my' do? Is it the same as 'local'? A3: To quote from the perl documentation `local($x)' saves away the old value of the global vari­ able `$x', and assigns a new value for the duration of the subroutine, which is visible in other functions called from that subroutine. This is done at run-time, so is called dynamic scoping. local() always affects global variables, also called package variables or dynamic vari­ ables. `my($x)' creates a new variable that is only visible in the current subroutine. This is done at compile-time, so is called lexical or static scoping. my() always affects private variables, also called lexical variables or (improperly) static(ly scoped) variables. What this means is that with local, variables are propagated to successive function calls. So ... -- code -- $variable = 'foo'; sub dynamic { local $variable = 'bar'; print_variable(); } sub static { my $variable = 'bar'; print_variable(); } sub print_variable { print "variable is $variable\n"; } print_variable(); static(); dynamic(); -- end code -- prints out foo foo bar Q4: Given a string $text containing multiple lines of text, how do you strip all the html tags? A4: The very short answer is : $text =~ s/<.*?>//gs; a longer answer, taken from the perldoc, is $text =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs however, for various reasons (CDATA sections, Javascript, etc) this won't work for all cases. A better solution would be to use the CPAN module HTML::Parser which has had years of experience and bug fixing. There's no point reinventing the wheel when time could be spent more productively implementing other features. Using this, the way to do it would be : my $p = HTML::Parser->new(api_version => 3, text_h => [ sub { print shift }, "dtext" ] ); $p->parse_file($file); Q5: Given a string $text, write a regular expression to strip the white space from the beginning and end of the string A5: $text =~ s/^\s*(.*?)\s*$/$1/s; # the s at the end is in case there are # carriage returns in the string another method might be $text =~ s/(^\s*|\s*$)//sg; ------------------------------------------------ C / C++ Q6: What is the significance of a const variable? It is set at declaration and cannot be subsequently change. const int FOO = 42; /* by convention constants are named UPPER CASE */ FOO = 13; /* Error ! */ This is a relatively new feature of C so in some cases it may be better to define constants using CPP macros such as #define FOO (42); Q7: A structure has an integer, a pointer and a char, how big is it? A7: This is a lot trickier than it looks. The following program shows three ways of doing it. -- code -- #include struct foo { int i; char c; char * p; } __attribute__ ((packed)) foo_bar = { 1, 'c', "for want of something else" }; typedef struct foo foo_t; int main (void) { printf("%d == %d == %d\n", sizeof(foo_bar), sizeof(foo_t), sizeof(int)+sizeof(char)+sizeof(void *) ); return 0; } -- end code -- The reason why it is tricky is because, without the __attribute__ ((packed)) statement the three answers will not come out the same. The reason for this is that there is a run time penalty for accessing 32 bit ints which aren't aligned on a dword boundary so GCC pads the structures for speed. The above statement prevents this. Declaring the whole struct as packed won't work in C++ - instead each member of the struct has to be declared as packed. A similar effect, without added syntax, can be achieved by using the command line option -fpack-struct on GCC > 2.7.0 (although there were bugs in 2.95.1 and 2.95.2 that affected this). Q8: Whats wrong with the following code: char a[256]; unsigned char x; for( x=0; x #include #include /* Reverse the order of words in a string */ int main (int argc, char ** argv) { /* The input and the result */ char * string, * result; /* tmp variables */ char * tmp; /* If we've been given an argument, then use that. Otherwise use the default. * we have to strdup it otherwise newer compilers automatically make it const. * This could be fixed by using the -fwriteable-strings flag */ if (argc>1) string = strdup(argv[1]); else string = strdup("The cow jumped over the moon"); /* set up a buffer large enough to accomodate the result*/ if (!(result = (char *) malloc (sizeof(char) * strlen(string)))) { fprintf(stderr, "Failed to malloc buffer\n"); return -1; } /* and set the null character so strcat works */ *result = '\0'; /* * go backwards through the string getting everything from the end * to the first space from the end */ while ((tmp = rindex(string, ' '))) { /* Add on a space after. n the first run tmp will be 'moon ' */ strcat(tmp, " "); /* concatenate the last word of the sentence on to the result */ /* On the first run result should now be 'moon ' */ strcat(result, tmp + 1); /* set the end of string to be where the last space was */ /* On the first run, string shuld now be 'The cow jumped over' */ *tmp = '\0'; } /* Now stick the last bit of string (in this case 'The') on the end */ strcat(result, string); /* print out the result */ printf ("%s\n", result); return 0; } -- end code -- Q10: What is a static member of a class (also called: class variable)? What can it be used for and how do you define it? A10: A static member is a variable that is part of a class but not part of an object of that class. Therefore it is like a global variable but confined to the name space of a class. There is only one copy of a static member rather than one copy per class. This means that it is good for saving space, useful for defaults and very hand for doing patterns like singletons. To define one you use the syntax class Foo { static int bar = 0; public: Foo (); } to retrieve it or set it use the syntax Foo::bar = 42; Only static methods may set static members static void set_bar(int num); Foo::set_bar(42); Q11: How do you use exception handling in C++? Briefly explain in an example how you define exception handling and what statements are used. Exception handling in C++ is done using the syntax try { // do something } catch (Error e) { // handle the error } Essentially exceptions are a way of propogating errors up the call chain easily without having to pass references to variables. The exception handler can rethrow the exception if it decides that it cannot handle it. So, for example : class Error { const char * err; public: Error(char * err_); virtual void print() const { cerr << err; } } void foo(int n) { if (n>9) throw Error("too large"); } try { foo (11); } catch (Error e) { e.print(); } would print out the string "too large" to stderr. Q12: Write a class template for a stack of elements of arbitrary type. Include the functions "push", "pop", "size" and the constructor as well as the destructor. A12: -- code -- //stack.h template class Stack { struct Link { Link * previous; Link * next; T value; Link (Link * p, Link * n, const T& v): previous(p), next(n), value(v) {} }; Link * head; Link * tail; public: Stack() ; ~Stack(); void push(const T&); T pop (); int size(); void printAll(); } ; //constructor with the default size 10 template Stack::Stack() { head = 0; tail = 0; } // destructor template Stack::~Stack() { // get the first element Link * l = head; // iterate through the list while (l) { // store temporarily Link * tmp = l; // get the next item l = tmp->next; // delete this one delete tmp; } // set the list to null; head = 0; tail = 0; } // push an element onto the stack template void Stack::push(const T& value) { // if the list is empty then .. if (!head) { // create a new head head = new Link(0, 0, value); // and set the tail to be the head tail = head; return; } // otherwise create a new element and link it to the tail tail->next = new Link (tail, 0, value); // and then set the tail to be the last element tail = tail->next; } // pop an element off the stack template T Stack::pop() { // if there's nothing on the stack then ... if (!tail) // ... return nothing. This should // probably throw an exception // as it could give confusing results // if T is int. return 0; // ... otherwise get the last value Link * tmp = tail; // set the tail to be the value before tail = tmp->previous; // and chop off the end of the list tail->next = 0; // and return the value return tmp->value; } // get the size of the list template int Stack::size() { // initialise the counter int i = 0; /// get the first element Link * l = head; // iterate through while (l) { // increase the count i++; // get the next value l = l->next; } // return the count return i; } // print every element in the list template void Stack::printAll() { // same code as the size method // but it prints rather than return // the size int i = 0; Link * l = head; while (l) { i++; cout << i << ": " << l->value << endl; l = l->next; } } -- end code -- ------------------------------------------------ WEB Q13: Whats the difference between the GET and POST method? A13: The GET method passes CGI parameters by way of the query string. As such certain values must be escaped (a space, for example, is escaped to %20). Acccording to the HTTP specification the GET method is idempotent. This means that the side effects of more than one GET request is the same as with one. What this means in practice is that browsers and proxies may cache a GET request and that it isn't an ideal method if you want to log or store the results of every request. POST is passed via the STDIN file handle (unlike via the QUERY_STRING environment variable for GET). Because of this it is hidden from the user and is much more suitable for things like passing large values (such as files or many, many variables). It is not limited in the size of data you can send which some GET implentations are. One advantage of GET requests is that it is possible to save the current state of the request as a single URL which isn't possible with POST. Q14: I want to provide a personalised web page, which presents different data to different users - what URL/CGI techniques can I use to identify each user? What are the advantages and disadvantages of each method? A14: IP address - By assuming each user is on a unique IP address then this allows you to track a user through the site and store customisations and other data between and during sessions. However it is unlikely that a user is unique to an IP address and an IP address is unique to a user. SessionID - by appending a unique SessionID (perhaps generated by MD5 hashing some data such as the user's IP, the time, the process number and a random number) to requests then the user can be tracked. This can either be done via GET requests which produces ugly URLs or by POST request which means every single link must be a form or by some combination. An alternative would be to set the session ID in a cookie but not all browsers support cookies. Username and Password - This will actually work with in combination with one of the previous techniques (usually the sessionid) - a user logins and a new sessionid is generated. This sessionid will expire after a while to prevent users mistakenly handing out URLs which automatically allow people to log in as them. In practice there is no fool proof way of providing a personalised web page. The general technique is to request that the user logs in, the HTTP server then sets a cookie (if possible and if desired by the user) with their username and password or some sort of authentication token in it so that they are automatically logged in when they return (or jump straight into the middle of the site). Then they are tracked using session ids either in cookies where possible or via CGI parameters in hidden field forms or encoded in URLs. The session id usually expires based on other parameters such as time and being accessed from another IP address or HTTP referer headers. By doing this then almost all surfers will be able to have a personalised experience even if they refuse cookies. Q15: How can a document specify that it should not be cached by a client or proxy server? A15: There are several ways to do this and a combination is usually best to work around bugs and 'features' in various browsers. These boil down to two basic methods - putting tags in the HTML and sending certain HTTP headers back. The HTTP headers relevant are : Pragma: no cache - most caches do not honour these headers Expires - setting a date in the 'past' will, theoretically, prevent a document from being parsed. Cache-Control - this is new to HTTP 1.1 but is very flexible and powerful. Last-Modified - by setting this in the past some proxies will not cache a document. ETags - this HTTP 1.1 feature is like a checksum generated by the HTTP server to help caches decide whether to cache or not. In HTML one can achieve the same effect by using a META tag in the HEAD element in the form where name and content are the same as the HTTP headers above. In this case the *browser* may not cache the page. However this is extremely unlikely to affect proxies since they do not parse the HTML. It also doesn't help non HTML documents. A third method is to generate a unique url by putting a random number in it, either as a CGI parameter : http://foo.com/nevercache.html?948798739875389753987539875 which should have no effect on the page actually being displayed. However some caches may (wrongly) ignore any CGI parameters so using a rewrite engine like Apache's mod_rewrite, this could be rewritten as http://foo.com/93429487293847234/nevercache.html (where the rewrite engine will strip out the random number part). The random number would have to be inserted into every link by way of Server Side Include or CGI or another rewrite engine (that parses the HTML and inserts a new random id into appropriate links every time the page is displayed). It should be noted that there is no guarantee that pages will not be cached since the browser or proxy could be a very naive implementation or, at the other end of the scale, something that is 'smart' and attempts to do clever tricks anyway. With this in mind a programmer relying on no caching would be wise to have mechanisms in place to check to see that this is indeed the case. Q16: What do the numbers 200,302,404 mean to you ? A16: 200 - OK 302 - FOUND (i.e the document has moved to a new location but this new location is liable to change frequently. Used by CGI scripts that send a Location: header) 404 - NOT FOUND Q17: A user complains "http://www.yahoo.de is really slow". How would you attempt to debug? A17: It would depend on the technical competence of the user. The solution is to identify the source of the problem. The obvious thing to check would be to see if it is the site itself being slow. The way to check this would be to see if it is slow for you especially from remote site (check using a web browser on a remote machine with X tunneled over shh or some other technique) preferably with no caching whatsoever. If it's not for you then check with them whether it it is slow *all* the time or just sometimes. If it's sometimes then you could check to see whether that's because they're doing a specific query (which should then be investigated using a profiler by a developer) or possibly because they're being redirected to a specific machine by the load balancer or because a machine is overloaded. If it's slow all the time then you need to check to see if all other sites are also slow or if it's just yahoo.de. If it's all of them then checks to see the performance of the machine (if it's a 486DX running over a 9600bps phone) would be in order. If it's just yours then you could see if certain features such as images, java or flash were affecting it. A final thing would be to step them through a traceroute of a request to see where the bottle neck is - for example a peering arrangement may have failed and so that may need to be investigated. Q18: Roughly sketch out a design of an http server. keep it really simple. with the ability to just respond to GET requests and the ability to handle multiple connections simultaneously A18: The server either sits as a daemon listening on a particulalr port or is activated via inetd. If it is a daemon then as the request comes in then a child is forked off. To handle GET requests then the url is parsed and everything after an initial question-mark (?) is placed in the environment variable QUERY_STRING. Then the document requested is examined, various techniques for security such as making the URL absolute and checking to see that it is not outside a particular directory structure can be used. Then check to see if the document should be executed (either explicitly through configuration files and handlers or by checking to see if the executable permission bit has been set) and if it is then it should be executed using the shell and the output collected. If there is no output then the appropriate header is sent back. Otherwise the output is returned to the client. If it isn't to be executed then the document requested is examined. If the server does not have permission to read the file then the appropriate response is sent back (401 or 403) otherwise an HTTP 200 (OK) header is sent and the document is sent after. After any of these then the child exits gracefully, cleaning up after itself. Techniques such as logging, authorization or redirection could also be built in. Q19: Imagine you have all the resources necessary: what would you do in order to hack Yahoo!? By hack we mean somehow change http://www.yahoo.de A19: The short answer is - get a job there. Or somehow socially engineer my way into the building and leave some device to do the nefarious deed for me (a micro-pc such as a Cappucino, an Ipaq or a Zaurus would be perfect. Or even a Sega Dreamcast but I wouldn't waste a broadband adaptor because they're hard to get hold of). Having local access makes things a lot easier. You can packet sniff a (non switched) network, walk up to the machines, ask people to type in passwords from your machines whilst a keyboard sniffer is running or even be given the root password if you're a trusted techie. Then you just leak that to somebody else. If you're not fussed about getting caught then you walk to the servers (if you have physical access to them) and trash them in some way - either by simply turning them off or by more extreme methods such as hitting them with a fire axe. Remotely is more difficult. And very tedious. The obvious method would be to run scans on all the externally visible machines on the yahoo.de network and build up a map of the topology with the OS, open ports and services of each machine profiled. To hide oneself, this could be done through some anonymising tools such as IP spoofing, an anonymiser, a cheap dialup, hijacking somebody else's bandwidth (at a cyber cafe or by finding an open wireless node) or getting a shell account on another box. Once one had a map, you could then try every available remote exploit (whilst remaining hidden via the techniques above) to gain access to one of the machines and then try and work one's way to the web servers. Since yahoo.de probably has load balanced, replicated web servers with backups, this would be hard, since a simultaneous attack on all servers would be necessary to effect a change.. One idea might be to construct an outlook virus or word macro virus that installs a keyboard sniffer on a machine and then mails the logs to an anonymous account every day. Or that tries exploits on routers in the office in the hope that the leased line will go down and the web servers will be unavailable to the outside world. A final thing would be to go through the site meticulously attempting likely exploits via the cgi scripts (reading the password file, execute an arbitary script etc). Essentially there are a thousand techniques to try, most of which have probably already been tried, virtually none of which are likely to work. In the end the results don't warrant the effort expended. ------------------------------------------------ UNIX Q20: Write a command to recursivly search all html files below the current directory for the string "ABCDE" A20: find ./ -name '*.htm*' -exec grep -q ABCDE {} \; -print