Author Topic: analysing URL access by date/time/duration  (Read 4757 times)

analysing URL access by date/time/duration
« on: 20 May, 2008, 11:01:18 am »
The world is full of a lot of clever people.  We are a simple SME running a basic peer to peer MS network; 35 PCs, no specific servers but 3 'big' PCs used as shared devices; nothing as sophisticated as Exchange or similar.  We access the world via a VIGOR 2600 router to a BT circuit looked after by our ISP.

I would like to be able to record/analyse external URLs accessed during the working day. Date/Time/Duration by URL would be nice.

I am aware of potential Employee Privacy problems so want to steer clear of User info if possible.

Is there a simple piece of software that will run on a central PC and capture this data? There is no usable capture on the router.

Re: analysing URL access by date/time/duration
« Reply #1 on: 20 May, 2008, 11:04:55 am »
Without doing anything expensive the only way you are going to get this kind of data is if you force everyone through some kind of choke point where you can look at the sesions. This means either using a proxy server or a firewall and then you can monitor its logs.
I think you'll find it's a bit more complicated than that.

Re: analysing URL access by date/time/duration
« Reply #2 on: 20 May, 2008, 11:08:26 am »
Even then, would they be able to get the duration of the users use of the URL. It's a request response protocol isn't it so the page is delivered to the browser. <- Full stop. The pipework will not even know if the browser has been closed will it ?

Re: analysing URL access by date/time/duration
« Reply #3 on: 20 May, 2008, 11:11:48 am »
With http 1.1 type session the session stays open I think. It usually gets closed either by closing the browser, browsing to a new page or if left idle the server closes it after a set amount of time.
I think you'll find it's a bit more complicated than that.

Re: analysing URL access by date/time/duration
« Reply #4 on: 20 May, 2008, 11:15:43 am »
Drop the concept of "duration". You can't measure it. There's no way to know what the user was actually doing between the page loading and the next page loading. Or if they've got more than one browser window open at once.

As for logging the URLs:-

Install a proxy server and block all outgoing HTTP traffic at the firewall except from the proxy server. Inform the employees about this (they'll realise something is up when you make them use a proxy) and that whilst a certain amount of personal web use is allowed, taking the piss with it is not.

Otherwise:-

The router does have some logging functionality via syslog, but from a quick glance of the manuals I can't tell whether you'd be able to set the firewall to allow (but log) outbound HTTP access (it does claim the firewall can do stateful inspection).

It doesn't look like you can set the router to send all traffic to one of the other ports on it's internal switch, so you could put in a hub between the top-level switch and the router, and then connect a PC to the hub to sniff all of the traffic. You could then use any number of network monitoring tools based on libpcap to rip out the IP/MAC addresses and URLs.

Other questions that make it more difficult:

Do you use DHCP? (This just means you need to record MAC addresses and IP Addresses).
Does anyone use the Wireless network on the router itself?
"Yes please" said Squirrel "biscuits are our favourite things."

Re: analysing URL access by date/time/duration
« Reply #5 on: 20 May, 2008, 11:19:09 am »
You could install IPCop with one of it proxy addons on an old PC and use that between the router and your network as an extra firewall and logging proxy. Its free - have a play with it.
I think you'll find it's a bit more complicated than that.

Re: analysing URL access by date/time/duration
« Reply #6 on: 20 May, 2008, 12:08:04 pm »
Thanks guys.

Since running a couple of IBM mid range mainframes with old fashioned stuff like VM, VSE, CICS etc I've become lazy.  I bought the second genuine IBM PC that was sold in the UK with twin 5.25" floppies and a mono character output and original dBase dot prompt, worked through tiers of PC/DOS to a very early Windows installed on a WANG PC. A brief flirt with LINUX about 5 years ago. Since then stuck firmly to MS software - less than 3 years to retirement and I fancy a quiet life.

However, looks like a LINUS box is about to be setup, even if only as a personal plaything when yacf is a bit slow.

Re: analysing URL access by date/time/duration
« Reply #7 on: 20 May, 2008, 12:24:23 pm »
If you go for something like IPcop you don't need to know any Linux. It installs from a CD with just a couple of questions to give it an IP address etc and after that it behaves like an appliance and you mange the whole thing through a web based gui.
I think you'll find it's a bit more complicated than that.

sas

  • Penguin power
    • My Flickr Photos
Re: analysing URL access by date/time/duration
« Reply #8 on: 20 May, 2008, 01:42:52 pm »
It's impossible to measure duration. On Firefox I've got about 20 tabs open so you can't rely on a change of page closing the connection, and in addition some sites such as GMail will continually request data in the background.

It should be possible to setup a transparent proxy, so everything gets redirected via the proxy server without reconfiguring every computer.
I am nothing and should be everything

Re: analysing URL access by date/time/duration
« Reply #9 on: 20 May, 2008, 01:44:52 pm »
If you go for something like IPcop you don't need to know any Linux. It installs from a CD with just a couple of questions to give it an IP address etc and after that it behaves like an appliance and you mange the whole thing through a web based gui.

Cheers

iso downloaded, cd burned, brain dead, sun is out, trees are still, tyres are @ 110 psi, off for a ride, back tomorrow.


rogerzilla

  • When n+1 gets out of hand
Re: analysing URL access by date/time/duration
« Reply #10 on: 20 May, 2008, 01:47:32 pm »
As an administrator of a time-wasting internet forum, I feel obliged to say that I disagree with this  ;)
Hard work sometimes pays off in the end, but laziness ALWAYS pays off NOW.

Re: analysing URL access by date/time/duration
« Reply #11 on: 20 May, 2008, 01:51:43 pm »
rest assured that your particular time wasting forum would NEVER be banned from any network that I am responsible for. in fact I'm considering preloading it to favourites on my ghost master.

Re: analysing URL access by date/time/duration
« Reply #12 on: 20 May, 2008, 01:52:39 pm »
It's impossible to measure duration. On Firefox I've got about 20 tabs open so you can't rely on a change of page closing the connection, and in addition some sites such as GMail will continually request data in the background.
Oh you can definitely measure tcp session duration it's just quite often meaningless to do so.
I think you'll find it's a bit more complicated than that.

Re: analysing URL access by date/time/duration
« Reply #13 on: 20 May, 2008, 01:55:18 pm »
As an administrator of a time-wasting internet forum, I feel obliged to say that I disagree with this  ;)

As a former administrator of firewalls and proxy servers for a software company. It's not the people browsing cycling forums at work that are "a problem".

Back when our dev network wasn't on a switched network it was quite shocking how much traffic appeared at 6.30pm when no-one was in the office. You can do all the obfuscation you want, script things to run when everyone has gone home, rename the wget binary, pipe URLs in via a file, even write your own program to do it, but it was quite scary watching the output of:-

snoop | grep -i xxx

scroll by *quickly* on a terminal window (we're talking more than 40 URLs a second!).

If only they had clubbed together, there was a lot of duplicate downloading going on. :)

We only ever took action when they over stepped the mark w.r.t. content, or they're performance/output was suffering. Take away all external access and they just get bored and tired of the job.
"Yes please" said Squirrel "biscuits are our favourite things."

pdm

  • Sheffield hills? Nah... Just potholes.
Re: analysing URL access by date/time/duration
« Reply #14 on: 20 May, 2008, 02:34:15 pm »
How about setting up a transparent squid proxy through which all traffic must go (very easy using squid and a shorewall firewall) and then analysing the log file (very slightly more tricky; I use awstats for basic stats) on a small cheap linux or bsd based server box set up between your network and the BT switch....

I do simple stats on my logfile - see www. meiring. org. uk / awstats.pl?config=squid.meiring.org.uk (I am not embedding the url for obvious reasons)
Doing more stats is simply a matter of a few perl more scripts....

andygates

  • Peroxide Viking
Re: analysing URL access by date/time/duration
« Reply #15 on: 20 May, 2008, 03:10:29 pm »
We only ever took action when they over stepped the mark w.r.t. content, or they're performance/output was suffering. Take away all external access and they just get bored and tired of the job.

This would be my approach too.  Users slacking is a management issue not a technology issue - their bosses should just crack the whip a bit more. 

Webmail for example polls every few minutes.  My boss just came in asking if it was suspicious that a user was accessing gmail 24/7.  "No," say I, "he just leaves it logged on."

Users accessing Naughty Stuff is what proxy logs are for; maintaining a blacklist (of banned sites) is a chore though.  Our lot search periodically for naughty words, but that is limited to the administrator's imagination and our guy would never pick up on asphyx zentai felching. 

Bandwidth is a technical issue.  If the company has ample and the users aren't taking the piss, leave 'em to it.  One user pulling down SP3 can throw the stats totally out of whack anyway.  If you do have limited bandwidth, start by blocking streaming stuff - youtube for starters - but be aware of the possible morale hit when your staff can't get their funny kitten movies.
It takes blood and guts to be this cool but I'm still just a cliché.
OpenStreetMap UK & IRL Streetmap & Topo: ravenfamily.org/andyg/maps updates weekly.

Re: analysing URL access by date/time/duration
« Reply #16 on: 20 May, 2008, 05:26:26 pm »
The way to deal with bandwidth is to implement QoS. Make sure that when it needs it business specific traffic can grab 95% of the bandwidth but releases it when it doesn't need it. This keeps everyone happy.
I think you'll find it's a bit more complicated than that.

Re: analysing URL access by date/time/duration
« Reply #17 on: 20 May, 2008, 06:09:34 pm »
I'm not sure what you mean by sessions in the Web context. There aren't any. What you get is a page request, which is broken down into requests for each of the page components (images, etc.) Once the request has been served, that's the end of the interaction between server and client. Even if the browser is closed down, the server can't tell.

The user may then be reading the page, having a coffee, off in a meeting somewhere or, indeed, reading a different page in another tab or browser.

At some point the user clicks on a link or types an URL and requests another page from another, or the same, server, but that's not part of any session.

Which, of course, is what you have to work around to create a Web "log-in". Normally logging in means opening a session, but there aren't any, so you have to mimic it, which means, directly or indirectly, passing the authorisation (username/password/whatever) back with every request.

In Web analytics, it's usual to assume that a sequence of page requests to the same server, each separated by no more than x minutes, is a session, which really just means a sequence of related behaviours. x is chosen arbitrarily.

Re: analysing URL access by date/time/duration
« Reply #18 on: 20 May, 2008, 09:05:03 pm »
That was true of http 1.0 which was truly stateless and every object downloaded off a page required a separate TCP session but HTTP 1.1 instead uses a continuous session. All the elements that make up a page can be downloaded over a single TCP session which is then usually closed after a period of inactivity (normally 30 seconds). There is even a keepalive mechanism built into HTTP 1.1 that can keep the session open when it's idle.
I think you'll find it's a bit more complicated than that.