Every event that occurs in your Linux system is captured and stored as logs and it allows one to go diagnose the activities happening within the server. Logging is very crucial to system admins as it helps them to see almost every action performed in the server. If in cases of security, logs helps in determining logged-in users and actions performed. All kernel activities are also logged and are helpful in analyzing system performance.

All Linux systems have a dedicated directory for storing system logs, usually /var/log and they use a logging facility called rsyslog whose configuration file is located in /etc/rsyslog.conf and the system logging service is called rsyslog.service (an improved version of syslog.service, which still runs in some Linux systems). Programs send log entries to rsyslog which then checks the configuration file for matches and writes logs to the appropriate file in the log directory.

In /var/log directory, you should be able to see various log files from your system. Check example below:

$ ls -lht /var/log
-rw-r-----  1 syslog    adm             2.5M Nov 29 11:31 auth.log 
-rw-rw----  1 root      utmp            3.4M Nov 29 11:31 btmp 
-rw-r-----  1 syslog    adm             3.1M Nov 29 11:31 syslog 
-rw-rw-r--  1 root      utmp            286K Nov 29 11:26 lastlog 
-rw-rw-r--  1 root      utmp             21K Nov 29 11:26 wtmp 
-rw-r--r--  1 root      root             93K Nov 29 07:00 dpkg.log 
drwxr-xr-x  2 root      root            4.0K Nov 29 07:00 apt 
-rw-r-----  1 syslog    adm              92K Nov 29 07:00 kern.log 
drwxr-xr-x  2 root      root            4.0K Nov 29 00:25 containers 
drwxr-xr-x  5 root      root            4.0K Nov 29 00:25 pods 
-rw-r-----  1 syslog    adm              28K Nov 29 00:21 ufw.log 
drwxr-xr-x  3 root      root            4.0K Nov 29 00:20 calico 
-rw-r--r--  1 root      adm              49K Nov 29 00:08 dmesg 
-rw-r-----  1 syslog    adm             3.1M Nov 29 00:00 syslog.1 
-rw-r-----  1 syslog    adm             978K Nov 29 00:00 auth.log.1 
-rw-r-----  1 syslog    adm             152K Nov 28 23:59 kern.log.1 
-rw-r-----  1 syslog    adm              86K Nov 28 23:59 ufw.log.1 
-rw-r--r--  1 root      adm              49K Nov 28 23:40 dmesg.0 
-rw-r--r--  1 root      root             280 Nov 28 22:50 alternatives.log 
drwxr-x---  2 root      adm             4.0K Nov 28 22:18 unattended-upgrades 
-rw-r--r--  1 root      adm              15K Nov 28 22:18 dmesg.1.gz 
-rw-r-----  1 syslog    adm             123K Nov 28 22:18 kern.log.2.gz 
-rw-r-----  1 syslog    adm             151K Nov 28 22:18 syslog.2.gz 
-rw-r-----  1 syslog    adm              81K Nov 28 22:18 ufw.log.2.gz 
-rw-r-----  1 syslog    adm              13K May  8  2020 auth.log.2.gz 
-rw-rw----  1 root      utmp             64K May  8  2020 btmp.1 
-rw-r--r--  1 root      adm              14K May  7  2020 dmesg.2.gz 
-rw-r--r--  1 root      root            299K May  7  2020 dpkg.log.1 
-rw-r--r--  1 root      root             32K May  7  2020 faillog 
drwxr-xr-x  2 landscape landscape       4.0K May  7  2020 landscape 
-rw-r--r--  1 root      root            4.9K May  7  2020 cloud-init-output.log 
-rw-r--r--  1 syslog    adm              85K May  7  2020 cloud-init.log 
-rw-r--r--  1 root      adm              14K May  7  2020 dmesg.3.gz 
drwxr-sr-x+ 3 root      systemd-journal 4.0K May  7  2020 journal 
drwxr-xr-x  3 root      root            4.0K May  7  2020 installer 
-rw-r--r--  1 root      root             20K Apr 23  2020 alternatives.log.1 
-rw-r--r--  1 root      root            102K Apr 23  2020 bootstrap.log 
-rw-------  1 root      root               0 Apr 23  2020 ubuntu-advantage.log 
drwx------  2 root      root            4.0K Apr 23  2020 private 
drwxr-xr-x  2 root      root            4.0K Apr  8  2020 dist-upgrade 
drwxr-xr-x  2 ntp       ntp             4.0K Apr  2  2020 ntpstats

The most important log files are as discussed below:

  • /var/log/syslog and /var/log/messages – stores all global system activity including startup messages. Debian-based systems use /var/log/syslog while Red Hat use var/log/messages.
  • /var/log/auth.log and /var/log/secure – Keeps security-related activities such as root user actions, logins and output from pluggable authentication modules (PAM). Debian-based systems use /var/log/auth.log while Red Hat use /var/log/secure.
  • /var/log/kern.log – keeps all kernel events, errors and warning logs, useful in troubleshooting custom kernels.
  • /var/log/apache and /var/log/nginx – If you are running a webserver with Apache2 and Nginx respively, logs are stored in these directories.

Analyzing Linux Web Server Logs with GoAccess

It is not just enough to know where logs are located. The importance of logs is to help system admins understand what is happening on a Linux system and troubleshoot where needed. They should therefore to able to view and analyze the logs are draw useful information. In this guide we are going to study a tool called GoAccess that is very helpful in analyzing system logs.

What is GoAccess and Why use it?

GoAccess is an open source web log analyzer. It is real time and provides an interactive viewer on the terminal. It quickly analyzes and provides valuable HTTP statistics.

It was designed to be a fast terminal-based analyzer whose main idea is to rapidly and in real time view and analyze web statistics without having to use your browser. Apart from the terminal output, GoAccess has the ability to generate complete, self-contained, real-time log reports in html, json and csv.

Features of GoAccess

Some of the most interesting features of GoAccess include:

  • Completely real-time – All metrics are updated every 200ms on the terminal output and every 1s on html output.
  • It requires very minimal configuration. You can run it against your access log, letting GoAccess parse access log and output stats.
  • It tracks application response time which is quite useful if identifying pages slowing down your site.
  • Supports nearly all web log formats – It allows any custom log format string and the preferred ones include apache, nginx, Amazon S3, Elastic Load Balancing and CloudFront among others.
  • Incremental log processing – GoAccess is capable of processing logs incrementally using on-disk persistance option.
  • One dependency – GoAccess is written in C and it requires only ncurses installed.
  • Metrics per Virtual Host – If you have more than one Virtual Host, GoAccess displays which host is consuming most web server resources.
  • Color Scheme Customizable – You can tailor GoAccess to your favorite themes, either on the terminal or applying a stylesheet to HTML.
  • Supports large datasets – It has the ability to parse large logs. It has good memory usage and good performance. Storage supports on-disk persistence.
  • GoAccess supports Docker containers through the use of volume mapping and editing of GoAccess configuration file.
  • GoAccess enables you to determine the  amount of hits, visitors, bandwidth, and metrics for slowest running requests by the hour, or date.

GoAccess Supported Log Formats.

GoAccess supports nearly all web log formats, allowing any custom log format string. Some of the predefined options include:

  • Amazon CloudFront (Download Distribution).
  • Amazon Simple Storage Service (S3)
  • AWS Elastic Load Balancing
  • Combined Log Format (XLF/ELF) Apache | Nginx
  • Common Log Format (CLF) Apache
  • Google Cloud Storage.
  • Apache virtual hosts
  • Squid Native Format.
  • W3C format (IIS).

Installing GoAccess on Linux and MacOS

First install requires dependencies:

#On Ubuntu/ Debian
sudo apt-get update
sudo apt-get install libncursesw5-dev libglib2.0-dev libgeoip-dev libtokyocabinet-dev

#On Fedora Linux 21 and CentOS 7
sudo yum install ncurses-devel

#On Fedora Linux 22 and CentOS 8
sudo dnf install ncurses-devel

#On macOS
brew install ncurses

You can easily install GoAccess using package manager for various Linux distributions as below:

# Ubuntu/ Debian
apt-get install goaccess

# or add an official GoAccess repository on Ubuntu/ Debian to ensure that you install the latest version.
sudo apt-get install apt-transport-https.
echo "deb https://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list
wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/goaccess.gpg add -
sudo apt-get update
sudo apt-get install goaccess

# On Fedora
sudo yum install goaccess

# on Arch Linux
sudo pacman -S goaccess

# on OS X / Homebrew
brew install goaccess

Or build GoAccess from release. Download, extract and compile GoAccess as below:

wget https://tar.goaccess.io/goaccess-1.4.2.tar.gz
tar -xzvf goaccess-1.4.2.tar.gz
cd goaccess-1.4.2/
./configure --enable-utf8 --enable-geoip=legacy
make
make install

Using GoAccess Web Log Analyzer

In my demo, I have a simple site running on Apache2 on Ubuntu 20.04.

GoAccess configuration file is found in /etc/goaccess/goaccess.conf. Open it with your preferred editor and configure date and log formats as you need. Example as shown below:

# of the Apache's log formats below.
#
date-format %d/%b/%Y
#

# NCSA Combined Log Format
#
log-format %h %^[%d:%^] "%r" %s %b "%R" "%u"
#
# NCSA Combined Log Format with Virtual Host
#
log-format %^:%^ %h %^[%d:%^] "%r" %s %b "%R" "%u"
#

I want to output access log from apache log directory with GoAccess. Change to Apache log directory

cd /var/log/apache2

Run the command below:

goaccess /var/log/apache2/access.log

Output is a real-time log view and analyzer

GoAccess Reports

To get various report formats, run the commands below for html, json and csv

goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a > report.html

goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T --no-csv-summary -o csv > report2.csv

goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -d -o json > report2.json

You notice that I am parsing a custom time format because sometimes GoAccess throws incompatible time format errors. Confirm the reports are created in the directory you are working from:

 ls -l /var/log/apache2/
total 1064
-rw-r----- 1 root adm    6141 Nov 29 16:29 access.log
-rw-r----- 1 root adm    1638 Nov 29 14:43 error.log
-rw-r----- 1 root adm       0 Nov 29 12:55 other_vhosts_access.log
-rw-r--r-- 1 root root   3934 Nov 29 15:19 report2.csv
-rw-r--r-- 1 root root  14460 Nov 29 15:18 report2.json
-rw-r--r-- 1 root root 349181 Nov 29 15:17 report.csv
-rw-r--r-- 1 root root 349698 Nov 29 15:56 report.html
-rw-r--r-- 1 root root 349181 Nov 29 15:16 report.json

You can also you can easily monitor logs for quick outputs using ‘tail -f’ as below

tail -f access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -

And you can filter output with ‘grep’ command

tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -
Output

GoAccess Multiple Files

You can parse multiple files as below:

goaccess access.log access.log.1
cat access.log.2 | goaccess access.log access.log.1 -

Real-time HTML Outputs

GoAccess can output real-time data in the HTML report

goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -o /var/www/html/report.html --real-time-html

To view this report, navigate to http://your_site/report.html.

GoAccess listens on port 7890 by default. You can parse a different port and ensure that you have opened the port on server firewall

goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -o /var/www/html/report.html --real-time-html --port=<new-port>

GoAccess Filtering

You would want to filter dates out of the webserver log. Use the commands below:

sed -n '/29\/Nov\/2020/,$ p' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -

Or parse a specific time frame

sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -

GoAccess Files, status codes and bots

o parse specific pages, e.g., page views, html, htm, php, etc. within a request

awk '$7~/\.html|\.htm|\.php/' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -

Or to parse a specific status code, e.g., 500 (Internal Server Error):

awk '$9~/500/' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -

GoAccess is a powerful web log analyzer and there is quite a lot that you can do with it. I hope this guide on how to use GoAccess to view and analyze Linux Web Server logs has been informative enough to get you working with GoAccess. Have fun! Below are more useful and interesting guides for you

LEAVE A REPLY

Please enter your comment!
Please enter your name here

three × 4 =