Squid Configuration Sample

[ OpenBSD 5.0, Squid 2.7 ]

Scenario:

At a private school I work with they have just recieved a DSL connection to the local ISP and before releasing the Internet connection the administrators have requirements (policies) within the school they wish to be implemented as part of the Internet Connection.

The computer department have come to a realisation that a Block by Default approach is not conducive to optimal educational use of the Internet, but there is a need for policing and monitoring its policies.

The chosen solution is two-fold. (1.) Physical supervision of Internet Access computers is mandatory and must be combined with user education and training. (2) Software blocking will be both informative and as comprehensive as possible.

Software monitoring, restrictions is where squid plays a significant role. Squid's Access Control Lists (ACLs) provide a very flexible environment for supporting organisational policies.

Details:

School Policies: The school has some standards of certain types of material it does not want students to access through the Internet (specifically pornography.) As a consequence of that requirement, the school also does not want students using 'chat' environments or public web hosted email services (eg. hotmail)

Network Policies: The DSL connection is 64K but the ISP has a very poor connection to the backbone (remember we're calling from Tonga) so there is a significant concern about bandwidth utilisation. The less unnecessary stuff going up and down the 'pipe' the better for us.

As a consequence of the bandwidth problem, and the need to keeping the students focussed on academically oriented pursuits, the network administrators want to ban a number of entertainment sites. Primarily to minimise bandwidth use and secondarily to keep students off time wasters.

Advertisers are problematic bandwidth consumers, so these will also be blocked where possible.

Network Configuration:

The school operates 3 subnets with differing authorisation levels. Through some magic, we would like to provide special access privileges for system administrators:

Segment
Purpose
2 class-rooms

controlled, timed access with potential limits to 'net access during class times. subnet_lab1, subnet_lab2

1 pub access Public Access for school community. This will include machines available to school administrators and general staff for accessing the network and 'NET. subnet_pub
1 admin administrator with freer access to the 'NET, probably need to be password authenticated.

Authentication is the simplest solution for providing system administrators with greater access to the Internet. To simplify this example, I will discuss authentication in the more detailed revision of this example.

The 7 stages we will cover to get our squid configuration working are:-

Specifying the Port to Listen On

Edit the file: /etc/squid/squid.conf

Now the scenario is out of the way, lets get down to configuring our squid cache/proxy.

The control of external access to the local lan should be managed by the Firewall.

To be safer (or am I just pedantic) I set the below restriction on where the squid server is listening.

# http_port 3128
http_port internal_nic1:3128
http_port internal_nic2:3128

Normally squid starts up and listens to 3128 on all network devices. The above just ensures that it is listening on port 3128 only for the internal network. Our firewall can further block port 3128 requests from coming through from the outside (but our ACLs should be handling any further problems.)

Specifying which network IPs we will support in squid

Next I set up my Access Control Lists (ACLs) defining the range of machines I have on the Internal Network.

# Networks allowed to use this Cache
acl subnet_lab1     src ip-address_lab1/netmask
acl subnet_lab2     src ip-address_lab2/netmask
acl subnet_pub      src ip-address_pub/netmask
acl all             src 0.0.0.0/0.0.0.0
acl dst_all         dst 0.0.0.0/0.0.0.0

I choose to list the subnets separately (all non-routeable IPs) as we have some policies for Internet access that can be managed using the subnet information. The acl "all" and "dstall" refer to any communications with all available internet IP addresses. The "all" refers to "source" or 'client' ip address wanting to use the cache. The "dstall" refers to "destination" or URL host being requested.

Specifying Time intervals we will support

Related to the subnet information will be certain time periods for which we want to disable specific subnets. So I have to set up the ACL for that

# After Hours Settings
acl TIMEafterhoursMORN time MTWHF 00:00-08:00
acl TIMEafterhoursAFT  time MTWHF 16:30-24:00
acl TIMEsatMORN        time A 00:00-07:00
acl TIMEsatAFT         time A 17:00-24:00
acl TIMEsundALLDAY     time S 00:00-24:00

Our sample Network Policy will provide different service levels dependent on the time of day (e.g. allow access after hours to different services blocked during business hours.)

Squid TIME acls cannot wrap from one day to the next, so to get from 4:30 in the afternoon until 8:00 the next morning, we have to actually specify one acl for 4:30 to midnight and another acl for midnight to 8 in the morning.

Specifying Organisational Policies (Restricted Sites)

A number of organisational policies require that we restrict use of the Internet and for that we have collected a list of urls and domains from the Internet. We are storing these urls in text files related to the categorisation we have chosen (eg. entertainment, porn, etc.)

# Regular Expression Review of URLs, and Destination Domains

# The first list are sites known to be wrongly blocked by the later list
acl unblock_porn        url_regex -i "/etc/squid/unblock_porn.txt"

# The following are the sites 
restricted by organisational policy
acl block_advertisers   url_regex -i "/etc/squid/block_advertisers.txt"
acl block_entertainment url_regex -i "/etc/squid/block_entertainment.txt"
acl block_webmail       url_regex -i "/etc/squid/block_webmail.txt"
acl block_porn          url_regex -i "/etc/squid/block_porn.txt"

We create ACLs for each category, and we store the text files in the /etc/squid directory. The text files list on separate lines the words or phrase we wish to block access to (such as domain adresses.)

Specifying Informative Messages relevant to Organisational Policies

Location: /usr/local/share/squid/errors

# TAG: deny_info
# Usage: deny_info err_page_name acl
#
#Default:
# none
deny_info CUSTOM_ERRS_ADVERTISERS   block_advertisers
deny_info CUSTOM_ERRS_ENTERTAINMENT block_entertainment
deny_info CUSTOM_ERRS_PORN          block_porn
deny_info CUSTOM_ERRS_WEBMAIL       block_webmail

We have created customised error messages for the different areas our organisational policy restricts access. The error messages are text files using the naming convention used by the squid error messages. We store the files in /usr/local/share/squid/errors (standard configuration in the squid-2.3 OpenBSD port.)

Note: the beautify our error messages (ie. add graphics & style sheet) we have created an alias directory in our Apache website to store these extra files. Squid will throw the custom messages at the user browser, but all other access has to come from the local website.

Configuring Access to the Cache

The final major thing, is to set up our rules for accessing the cache.

# TAG: http_access
# Allowing or Denying access based on defined access lists
#
# Access to the HTTP port:
# http_access allow|deny [!]aclname ...

The standard format, as shown above, is http_access followed by either allow or denu and then a list of your aclnames (with an optional ! at the begin to negate the aclname.) Note that aclnames are "ANDed" together.

There are a number of standard security configurations already in squid.conf, I've left them standing and added the things specific to our scenario.

Restricting Access to External

Sites - relevant to organisational policies

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
# http ACCESS PRIVILEGES
# --> URLs to Unblock
http_access allow unblock_porn

# --> Domains & URLS to block
http_access deny block_advertisers
http_access deny block_entertainment
http_access deny block_porn
http_access deny block_webmail

Our first action is to block those sites which are restricted by our organisational policies.

Allowing Specified networks access to the cache

Specifying access to cache from LAN machines

# --> Subnet Access to the NET

http_access allow localhost
http_access allow subnet_lab1
http_access allow subnet_lab2

In this example, we allow the local subnets to use the cache, so long as they are authenticated (again, if you are not using authentication then just remove the "authenticated" acl.)

Restricting Internal Access - relevant to organisational policies

Because we are not ready for prime-time, we denied Internet access to the public access machines. 1st they are two buildings away and we cannot supervise them at the moment, and 2nd we haven't gone through our education program for staff use.

# --> Subnet Access to the NET

<b>http_access deny  subnet_pub </b>
# During initial phase, keep subnet_pub off the air
# 
# After testing, the below script should be used
# --&gt; Format, deny 1st and then allow later
http_access deny subnet_pub TIMEafterhoursMORN
http_access deny subnet_pub TIMEafterhoursAFT
http_access deny subnet_pub TIMEsatMORN
http_access deny subnet_pub TIMEsatAFT
http_access deny subnet_pub TIMEsundALLDAY
# http_access allow subnet_pub authenticated

Because of the same above problems of supervising the public access terminals, we have included time based limiting. Once we are certain our system is better configured for public access then we can enable access from the public terminals within specified hours.

Ignoring the cache when requesting from Local Area Network

Next, we tell squid to not cache requests for the internal Local Area Network sites.

# always go direct to LAN sites
# always cache, and always cache (never_direct) all other sites.
always_direct allow localhost 
always_direct allow subnet_lab1 
always_direct allow subnet_lab2 
#never_direct allow all

Our local website doesn't need to be cached. Some of my friends think they get better performance (even for internal clients) by caching the local web server. Parts of our sites are static pages (straight html, images, and pdfs) but our new section is based on PHP so we will just avoid any further complications with our cache by not caching it.

Let's Go.

The final part is to specifically state that we want to be able to access the rest of the world, and we want to specifically deny access to the cache from anyone we have not specifically allowed access.

# And finally deny all other access to this proxy
http_access allow dst_all
http_access deny  all

Extending the Sample Configuration

This section further extends the previous example, but with more specifics. Partially as an aid to anyone wishing further examples, but primarily to document our network.

The portions of the example we will extend, and add upon are:

Authenticating Users

To maximise the potential for user conformance, while providing a more flexible user environment we have selected to use User Authentication. The most flexible for our configuration is the MSNT authentication module which is configured as below. (More details for installing is listed further below.)

All the clients are authenticated on an MS Windows NT Domain before they can use the network, so our choice was simplified.

After installing and testing the msntauth module, we configure the authentication by including the following directives in the /etc/squid/squid.conf file

Edit the file /etc/squid/squid.conf:

authenticate_program /usr/local/bin/msntauth
authenticate_children   15 
authenticate_ttl       900 seconds
authenticate_ip_ttl     60 seconds
# authenticate_ip_ttl_is_strict on

We specify the Authentication program and some important parameters.

In our environment we will let the authentication remain active 15 minutes after the last authentication (900 seconds). To annoy people who wish to share their passwords (should be more restrictive than this) we require authentication of a user to be tied to an ip address. If within 60 seconds two IP addresses request through the cache, both users will be denied access and be required to re-authenticate.

If we were really pedantic about password use (which may be relevant in our context) we could force authentication to remain with the originating authenticator until expiry. Specifically this prevents the user using two terminals.

Our organisation policy we setup authentication so (a) Only those designated for Internet Access can access the external web, (b) Our log files can determine by user their access patterns to the Internet. Note that this approach may be considered draconian by others and is dependent on the type of site you are running for which purpose you want to use authentication.

For authentication to be useful, we next have to specify an acl.

# Authentication
acl authenticated  proxy_auth REQUIRED
acl users_sysadmin proxy_auth  AdminID1 AdminID2

We want authentication of all users before they access the Internet (for this we will use 'authenticated') and we want to provide special privileges to System Administrators (for this we will use 'users_sysadmin.

The AdminID1, AdminID2 are users on the server that will provide the authentication (in our case on our Windows NT Domain.)

Specifying Organisational Policies (Restricted Sites)
# Regular Expression Review of URLs, and Destination Domains
acl unblock_pornURL        url_regex    -i "/etc/squid/unblock_pornURL.txt"
acl unblock_domainDOM      dstdom_regex -i "/etc/squid/unblock_domainDOM.txt"
acl unblock_stuffURL       url_regex    -i "/etc/squid/unblock_stuffURL.txt"
acl block_pornURL          url_regex    -i "/etc/squid/block_pornURL.txt"
acl block_pornDOM          dstdom_regex -i "/etc/squid/block_pornDOM.txt"
acl block_advertisersURL   url_regex    -i "/etc/squid/block_advertisersURL.txt"
acl block_advertisersDOM   dstdom_regex -i "/etc/squid/block_advertisersDOM.txt"
acl block_entertainmentURL url_regex    -i "/etc/squid/block_entertainmentURL.txt"
acl block_entertainmentDOM dstdom_regex -i "/etc/squid/block_entertainmentDOM.txt"
acl block_anonymizersDOM   url_regex    -i "/etc/squid/block_anonymizersDOM.txt"
acl block_webhostURL       url_regex    -i "/etc/squid/block_webhostURL.txt"
acl block_webhostDOM       dstdom_regex -i "/etc/squid/block_webhostDOM.txt"
acl block_badlangURL       url_regex    -i "/etc/squid/block_badlangURL.txt"
acl block_piratesURL       url_regex    -i "/etc/squid/block_piratesURL.txt"
acl block_piratesDOM       dstdom_regex -i "/etc/squid/block_piratesDOM.txt"

We drastically change our blocking scheme by using three separate methods of analysing a URL before we decide whether it should be allowed, or blocked. In our previous example we only used the full URL (urlregex) In this example, we use urlregex which analyses the full URL, and dstdom_regex which analyses only the host (domain) information of the URL.

This distinction is very important when we want to use a catch word like "quake" to block access to game sites that host quake tournaments. When we were blocking "quake" in the URL, students were unable to do research on Earthquakes as our URL based block prevented access.

By using dstdom_regex we can block only the reference to quake in the URLs (which still blocks Earthquake.com etc) By further refining our regular expression of quake, we can specify .quake. or ^quake. to block only sites with quake as a host (allow earthquake, deadquake, aquake) and block only domain names where quake. is at the very beginning, but allow quaken etc.

acl block_filesURLPATH     urlpath_regex -i "/etc/squid/block_filesURLPATH.txt"

A further improvement in selectivity with the url is the urlapath_regex which only looks at the "path" portion of the URL. We will use the path only portion to review which are file transfers, audio video that we do not want.

Of course Squid 2.5 (and possibly 2.4) supports acls for mime-types, but I'm trying to get this stuff working 1st.

The next acl we configure is to specify the maximum number of connections we want users to be doing. This is mostly relevant to the power users, who inexplicably consume significant bandwidth by running multiple browsers.

acl MaxCONNECTIONS        maxconn 5

Since this is the 1st time we're doing this, we will set a reasonable number initially and then change things along the way.

Note from the FAQ:

Note, the maxconn ACL type is kind of tricky because it uses 
less-than comparison. The ACL is a match when the number of 
established connections is greater than the value you specify.
Specifying Informative Messages relevant to Organisational Policies
deny_info CUSTOM_ERRS_ADVERTISERSurl   block_advertisersURL
deny_info CUSTOM_ERRS_ADVERTISERSdom   block_advertisersDOM
deny_info CUSTOM_ERRS_ANONYMIZERSdom   block_anonymizersDOM
deny_info CUSTOM_ERRS_BADLANGurl       block_badlangURL
deny_info CUSTOM_ERRS_ENTERTAINMENTurl block_entertainmentURL
deny_info CUSTOM_ERRS_ENTERTAINMENTdom block_entertainmentDOM
deny_info CUSTOM_ERRS_FILESurlpath     block_filesURLPATH
deny_info CUSTOM_ERRS_PIRATESurl       block_piratesURL
deny_info CUSTOM_ERRS_PIRATESdom       block_piratesDOM
deny_info CUSTOM_ERRS_PORNurl          block_pornURL
deny_info CUSTOM_ERRS_PORNdom          block_pornDOM
deny_info CUSTOM_ERRS_WEBHOSTurl       block_webhostURL
deny_info CUSTOM_ERRS_WEBHOSTdom       block_webhostDOM
deny_info CUSTOM_ERRS_MaxCONNECTIONS MaxCONNECTIONS

Our Custom Error Messages have also evolved to inform users which parts of the URL they have hit upon has caused the 'connection failure.'

We deem that this is more helpful to clients and will maximise our ability to analyse whether the ruleset is accurate/effective.

Configuring Access to the Cache
Restricting Access to External Sites - relevant to organisational policies
# --&gt; Domains &amp; URLS to block
http_access deny block_pornURL
http_access deny block_pornDOM
http_access deny block_advertisersURL
http_access deny block_advertisersDOM
http_access deny block_entertainmentURL
http_access deny block_entertainmentDOM
http_access deny block_anonymizersDOM

Our access configuration remains largely the same, we're just using more acls.

##
## SPECIAL PRIVILEGE SECTION FOR ADMINISTRATORS
## 
http_access allow users_sysadmin dst_all

One change we implement is to allow administrators greater freedom to the Internet, restricting their access only to sites specifically limited by the network policy and organisational policy.

userssysadmin is a proxy authentication acl, so this allow sequence will only be made available if the client user can authenticate to the users listed with userssysadmin (in our example: AdminID1, and AdminID2)

http_access deny block_webhostURL
http_access deny block_webhostDOM
http_access deny block_badlangURL
http_access deny block_piratesURL
http_access deny block_piratesDOM 
http_access deny block_filesURLPATH

We now restrict external access via the domain portion of the URL, giving us greater freedom to use words that would otherwise cause significant problem if used in the complete URL. We can also provide a set of limited users extra privileges, independent of the machines they are using.

http_access allow block_filesURLPATH authenticated TIMEafterhoursMORN !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEafterhoursAFT !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsatMORN !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsatAFT !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsundALLDAY !MaxCONNECTIONS
http_access deny  block_filesURLPATH

With file restrictions we choose to deny access to download files during peak use periods. Here we specifically allow file downloads to authenticated users after hours and when the user has not exceeded allowed maximum number of connections.

Otherwise, we will block file downloads.

Allowing Specified networks access to the cache
# --&gt; Subnet Access to the NET
http_access allow localhost
http_access allow subnet_lab1 authenticated !MaxCONNECTIONS
http_access allow subnet_lab2 authenticated !MaxCONNECTIONS</p>

The subnets not only have to be correct to allow access to the cache, the clients also have to be connected and must not be greater than MaxConnections (5 in our initial estimation.) To gain access to the cache, the client must

  • be in a valid ip-address (subnetlab1 or subnetlab2) AND
  • be an authenticated user (userid, password) AND
  • Must not have more than the MaxCONNECTIONS
Restricting Internal Access - relevant to organisational policies
http_access deny subnet_pub TIMEafterhoursMORN
http_access deny subnet_pub TIMEafterhoursAFT
http_access deny subnet_pub TIMEsatMORN
http_access deny subnet_pub TIMEsatAFT
http_access deny subnet_pub TIMEsundALLDAY
# http_access allow subnet_pub authenticated !MaxCONNECTIONS

There is minimal change in the time restriction. We have only included authentication and maxconn requirements to the commented access specifications.

Let's Go
http_access allow dst_all authenticated !MaxCONNECTIONS
http_access deny all

In our final line we have required authentication on going out from the cache to the rest of the world, just in case we've made some fundamentally stupid mistake somewhere else in our configuration.

Managing the Log Files

Edit the /etc/daily.local file and add the file lines:

if [ -x /usr/local/bin/squid -a -f /var/squid/logs/squid.pid ]; then
     /usr/local/bin/squid -k rotate
fi

Other Miscellaneous Issues ?

Squids DNS Startup Test

We get very poor service from our ISP, and one serious problem when we were configuring our server was not being able to resolve the DNS names for squid. Failing to find the dns entries for netscape.com, internic.net, nlanr.net, microsoft.com the squid server will just hang-around and then eventually quit.

# TAG: dns_testnames
# The DNS tests exit as soon as the first site is successfully looked up
#
# This test can be disabled with the -D command line option.
#
#Default:
# dns_testnames netscape.com internic.net nlanr.net microsoft.com
dns_testnames mydomain.com

To solve the startup problem (because our ISP will regularly have problems with their DNS server) we set the dns test to look for our host details, which is configured in our internal DNS Server.

Debugging your Configuration

# TAG: debug_options
# Logging options are set as section,level where each source file
# is assigned a unique section. Lower levels result in less
# output, Full debugging (level 9) can result in a very large
# log file, so be careful. The magic word "ALL" sets debugging
# levels for all sections. We recommend normally running with
# "ALL,1".
#
#Default:
# debug_options ALL,1
debug_options ALL,1 32,2

I was having a number of problems with squid while playing around with the configuration file (especially when trying to get authentication working) and because of the problems we were having with our ISP connection failures. Squid can log more information in the /var/squid/logs/cache.log file. By increasing the amount of information that is placed in there I had a much better understanding of when squid was failing.

Squid User and Group

Another problem I was having in updating and downgrading squid (I was originally attempting to use LDAP authentication in squid to synchronise accounts between Samba, Squid, & Windows 2000) is the fact that the source distribution will use nobody but the OpenBSD ports use www:www

# TAG: cache_effective_user
# TAG: cache_effective_group
#
# NOTE: OpenBSD ports packages use uid:gid www:www
# = To make sure uid:guid squid:squid works
# = You need to make sure the user/group exists
# = AND to chown -R www:www the /var/squid directories (if need be)
# 
#Default:
# cache_effective_user nobody
cache_effective_user  www
cache_effective_group www

While shifting between port and source I was continually having problems with the source not being able to use the directories created by the OpenBSD port. It took a while (dumb admin I am) to figure out that uid:gid were different between the different compilations. Sometimes I would remember the ./configure directive, sometimes I'd forget.

Authentication - the MSNT module

[source: msntauth-v2.0 http://stellarx.tripod.com]

The authentication module works pretty well, with little user involvement. Instructions are well documented in the accompanying README.html file.

The only customisations that was required was changing the default directory settings.

Edit File: confload.c (reference is out of date in the readme file)

#define CONFIGFILE        "/usr/local/squid/etc/msntauth.conf" 
#define DENYUSERSDEFAULT  "/usr/local/squid/etc/denyusers"
#define ALLOWUSERSDEFAULT "/usr/local/squid/etc/allowusers"

Change the settings to what is the general directory structure for OpenBSD

#define CONFIGFILE        "/etc/squid/msntauth.conf" /* Path to configuration file */
#define DENYUSERSDEFAULT  "/etc/squid/denyusers"
#define ALLOWUSERSDEFAULT "/etc/squid/allowusers"

Edit the Makefile to specify the directories where you wish the bin files to be located. (no autoconfig yet.)

Copy the sample msntauth.conf file from the source directory to the directory specified above (/etc/squid.) Edit the file to specify your Domain authentication configuration.

touch the file /etc/squid/denyusers touch the file /etc/squid/allowusers

Test that the authentication module is functioning correctly by manually executing it at the command prompt. Refer to the readme.html for further instructions on testing.

Content Filtering

If you think that filtering through the use of squid by URL or IP is draconian, some people actually have the need to filter even by the content of pages delivered.

For HTTP traffice, a proxy filtering solution is DansGuardian at www.dansguardian.org.

Transparent Proxy

If you want to use transparent proxying with squid-authentication, don't. Read the FAQ and source for further details.

FTP Proxy

Use ftp-proxy

SOCKS 5 Proxy

dante in ports

Cache Utilisation Analysis Tools

webalizer - (package) squidclients - http://www.cineca.it/~nico/squidclients.thml

Human readable reports on cache utilisation, or network utilisation is always good for something. A few of the tools that we have come across for generating automatic reports on the cache use include: calamaris, webalizer, squidclients, and sqmgrlog (renamed as sarg).

What does the log file record.

Calamaris

[Ref: http://calamaris.cord.de/]

Calamaris can generate a quick and neatly formatted report from the access files.

Calamaris interesting options:-
-a all (equivalent to: -d 20 -P 60 -r 1 -s -t 20
-d n show n top-level and n second level destinations
-P n show throughput data for every n minutes
-r n show n requesters
-s   show verbose status reports
-t n show n content-type, n extensions, and requested protocols


Output Format
-m mailformat
-w web HTML format

Sample usage:

#!/bin/sh
# Shell Script Used to generate log analysis reports from squid logs
# using calamaris
#
cd /var/squid/logs
gunzip access*.gz
cat access.log access.log.0 access.log.1 access.log.2 \
    access.log.3 access.log.4 access.log.5 access.log.6 \
    | calamaris -a -w &gt; squidreport.html

gzip access.log.*

# cat squidreport.html | mail -s "calamaris weekly report" somebody

Assumptions in the script are:

  • calamaris has been manually installed into /usr/local/bin
  • squid access log files are located at /var/squid/logs
  • log files are rotated for 7 days (0 ~ 6)<

sqmgrlog, sarg

[Ref: Google / Bing it]

Sarg is a Squid Analysis Report Generator that allow you to view "where" your users are going to on the Internet. Sarg generate reports in html, with many fields, like: users, IP Addresses, bytes, sites and times.

This is what we actually use, and it was so easy to follow the instructions I can't remember how it was done.