Every event that occurs in your Linux system is captured and stored as logs and it allows one to go diagnose the activities happening within the server. Logging is very crucial to system admins as it helps them to see almost every action performed in the server. If in cases of security, logs helps in determining logged-in users and actions performed. All kernel activities are also logged and are helpful in analyzing system performance.
All Linux systems have a dedicated directory for storing system logs, usually /var/log and they use a logging facility called rsyslog whose configuration file is located in /etc/rsyslog.conf and the system logging service is called rsyslog.service (an improved version of syslog.service, which still runs in some Linux systems). Programs send log entries to rsyslog which then checks the configuration file for matches and writes logs to the appropriate file in the log directory.
In /var/log directory, you should be able to see various log files from your system. Check example below:
$ ls -lht /var/log -rw-r----- 1 syslog adm 2.5M Nov 29 11:31 auth.log -rw-rw---- 1 root utmp 3.4M Nov 29 11:31 btmp -rw-r----- 1 syslog adm 3.1M Nov 29 11:31 syslog -rw-rw-r-- 1 root utmp 286K Nov 29 11:26 lastlog -rw-rw-r-- 1 root utmp 21K Nov 29 11:26 wtmp -rw-r--r-- 1 root root 93K Nov 29 07:00 dpkg.log drwxr-xr-x 2 root root 4.0K Nov 29 07:00 apt -rw-r----- 1 syslog adm 92K Nov 29 07:00 kern.log drwxr-xr-x 2 root root 4.0K Nov 29 00:25 containers drwxr-xr-x 5 root root 4.0K Nov 29 00:25 pods -rw-r----- 1 syslog adm 28K Nov 29 00:21 ufw.log drwxr-xr-x 3 root root 4.0K Nov 29 00:20 calico -rw-r--r-- 1 root adm 49K Nov 29 00:08 dmesg -rw-r----- 1 syslog adm 3.1M Nov 29 00:00 syslog.1 -rw-r----- 1 syslog adm 978K Nov 29 00:00 auth.log.1 -rw-r----- 1 syslog adm 152K Nov 28 23:59 kern.log.1 -rw-r----- 1 syslog adm 86K Nov 28 23:59 ufw.log.1 -rw-r--r-- 1 root adm 49K Nov 28 23:40 dmesg.0 -rw-r--r-- 1 root root 280 Nov 28 22:50 alternatives.log drwxr-x--- 2 root adm 4.0K Nov 28 22:18 unattended-upgrades -rw-r--r-- 1 root adm 15K Nov 28 22:18 dmesg.1.gz -rw-r----- 1 syslog adm 123K Nov 28 22:18 kern.log.2.gz -rw-r----- 1 syslog adm 151K Nov 28 22:18 syslog.2.gz -rw-r----- 1 syslog adm 81K Nov 28 22:18 ufw.log.2.gz -rw-r----- 1 syslog adm 13K May 8 2020 auth.log.2.gz -rw-rw---- 1 root utmp 64K May 8 2020 btmp.1 -rw-r--r-- 1 root adm 14K May 7 2020 dmesg.2.gz -rw-r--r-- 1 root root 299K May 7 2020 dpkg.log.1 -rw-r--r-- 1 root root 32K May 7 2020 faillog drwxr-xr-x 2 landscape landscape 4.0K May 7 2020 landscape -rw-r--r-- 1 root root 4.9K May 7 2020 cloud-init-output.log -rw-r--r-- 1 syslog adm 85K May 7 2020 cloud-init.log -rw-r--r-- 1 root adm 14K May 7 2020 dmesg.3.gz drwxr-sr-x+ 3 root systemd-journal 4.0K May 7 2020 journal drwxr-xr-x 3 root root 4.0K May 7 2020 installer -rw-r--r-- 1 root root 20K Apr 23 2020 alternatives.log.1 -rw-r--r-- 1 root root 102K Apr 23 2020 bootstrap.log -rw------- 1 root root 0 Apr 23 2020 ubuntu-advantage.log drwx------ 2 root root 4.0K Apr 23 2020 private drwxr-xr-x 2 root root 4.0K Apr 8 2020 dist-upgrade drwxr-xr-x 2 ntp ntp 4.0K Apr 2 2020 ntpstats
The most important log files are as discussed below:
- /var/log/syslog and /var/log/messages – stores all global system activity including startup messages. Debian-based systems use
/var/log/syslogwhile Red Hat use
- /var/log/auth.log and /var/log/secure – Keeps security-related activities such as root user actions, logins and output from pluggable authentication modules (PAM). Debian-based systems use
/var/log/auth.logwhile Red Hat use
- /var/log/kern.log – keeps all kernel events, errors and warning logs, useful in troubleshooting custom kernels.
- /var/log/apache and /var/log/nginx – If you are running a webserver with Apache2 and Nginx respively, logs are stored in these directories.
Analyzing Linux Web Server Logs with GoAccess
It is not just enough to know where logs are located. The importance of logs is to help system admins understand what is happening on a Linux system and troubleshoot where needed. They should therefore to able to view and analyze the logs are draw useful information. In this guide we are going to study a tool called GoAccess that is very helpful in analyzing system logs.
What is GoAccess and Why use it?
GoAccess is an open source web log analyzer. It is real time and provides an interactive viewer on the terminal. It quickly analyzes and provides valuable HTTP statistics.
It was designed to be a fast terminal-based analyzer whose main idea is to rapidly and in real time view and analyze web statistics without having to use your browser. Apart from the terminal output, GoAccess has the ability to generate complete, self-contained, real-time log reports in html, json and csv.
Features of GoAccess
Some of the most interesting features of GoAccess include:
- Completely real-time – All metrics are updated every 200ms on the terminal output and every 1s on html output.
- It requires very minimal configuration. You can run it against your access log, letting GoAccess parse access log and output stats.
- It tracks application response time which is quite useful if identifying pages slowing down your site.
- Supports nearly all web log formats – It allows any custom log format string and the preferred ones include apache, nginx, Amazon S3, Elastic Load Balancing and CloudFront among others.
- Incremental log processing – GoAccess is capable of processing logs incrementally using on-disk persistance option.
- One dependency – GoAccess is written in C and it requires only ncurses installed.
- Metrics per Virtual Host – If you have more than one Virtual Host, GoAccess displays which host is consuming most web server resources.
- Color Scheme Customizable – You can tailor GoAccess to your favorite themes, either on the terminal or applying a stylesheet to HTML.
- Supports large datasets – It has the ability to parse large logs. It has good memory usage and good performance. Storage supports on-disk persistence.
- GoAccess supports Docker containers through the use of volume mapping and editing of GoAccess configuration file.
- GoAccess enables you to determine the amount of hits, visitors, bandwidth, and metrics for slowest running requests by the hour, or date.
GoAccess Supported Log Formats.
GoAccess supports nearly all web log formats, allowing any custom log format string. Some of the predefined options include:
- Amazon CloudFront (Download Distribution).
- Amazon Simple Storage Service (S3)
- AWS Elastic Load Balancing
- Combined Log Format (XLF/ELF) Apache | Nginx
- Common Log Format (CLF) Apache
- Google Cloud Storage.
- Apache virtual hosts
- Squid Native Format.
- W3C format (IIS).
Installing GoAccess on Linux and MacOS
First install requires dependencies:
#On Ubuntu/ Debian sudo apt-get update sudo apt-get install libncursesw5-dev libglib2.0-dev libgeoip-dev libtokyocabinet-dev #On Fedora Linux 21 and CentOS 7 sudo yum install ncurses-devel #On Fedora Linux 22 and CentOS 8 sudo dnf install ncurses-devel #On macOS brew install ncurses
You can easily install GoAccess using package manager for various Linux distributions as below:
# Ubuntu/ Debian apt-get install goaccess # or add an official GoAccess repository on Ubuntu/ Debian to ensure that you install the latest version. sudo apt-get install apt-transport-https. echo "deb https://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/goaccess.gpg add - sudo apt-get update sudo apt-get install goaccess # On Fedora sudo yum install goaccess # on Arch Linux sudo pacman -S goaccess # on OS X / Homebrew brew install goaccess
Or build GoAccess from release. Download, extract and compile GoAccess as below:
wget https://tar.goaccess.io/goaccess-1.4.2.tar.gz tar -xzvf goaccess-1.4.2.tar.gz cd goaccess-1.4.2/ ./configure --enable-utf8 --enable-geoip=legacy make make install
Using GoAccess Web Log Analyzer
In my demo, I have a simple site running on Apache2 on Ubuntu 20.04.
GoAccess configuration file is found in /etc/goaccess/goaccess.conf. Open it with your preferred editor and configure date and log formats as you need. Example as shown below:
# of the Apache's log formats below. # date-format %d/%b/%Y # # NCSA Combined Log Format # log-format %h %^[%d:%^] "%r" %s %b "%R" "%u" # # NCSA Combined Log Format with Virtual Host # log-format %^:%^ %h %^[%d:%^] "%r" %s %b "%R" "%u" #
I want to output access log from apache log directory with GoAccess. Change to Apache log directory
Run the command below:
Output is a real-time log view and analyzer
To get various report formats, run the commands below for html, json and csv
goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a > report.html goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T --no-csv-summary -o csv > report2.csv goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -d -o json > report2.json
You notice that I am parsing a custom time format because sometimes GoAccess throws incompatible time format errors. Confirm the reports are created in the directory you are working from:
ls -l /var/log/apache2/ total 1064 -rw-r----- 1 root adm 6141 Nov 29 16:29 access.log -rw-r----- 1 root adm 1638 Nov 29 14:43 error.log -rw-r----- 1 root adm 0 Nov 29 12:55 other_vhosts_access.log -rw-r--r-- 1 root root 3934 Nov 29 15:19 report2.csv -rw-r--r-- 1 root root 14460 Nov 29 15:18 report2.json -rw-r--r-- 1 root root 349181 Nov 29 15:17 report.csv -rw-r--r-- 1 root root 349698 Nov 29 15:56 report.html -rw-r--r-- 1 root root 349181 Nov 29 15:16 report.json
You can also you can easily monitor logs for quick outputs using ‘tail -f’ as below
tail -f access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -
And you can filter output with ‘grep’ command
tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED - Output
GoAccess Multiple Files
You can parse multiple files as below:
goaccess access.log access.log.1 cat access.log.2 | goaccess access.log access.log.1 -
Real-time HTML Outputs
GoAccess can output real-time data in the HTML report
goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -o /var/www/html/report.html --real-time-html
To view this report, navigate to http://your_site/report.html.
GoAccess listens on port 7890 by default. You can parse a different port and ensure that you have opened the port on server firewall
goaccess access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -o /var/www/html/report.html --real-time-html --port=<new-port>
You would want to filter dates out of the webserver log. Use the commands below:
sed -n '/29\/Nov\/2020/,$ p' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -
Or parse a specific time frame
sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -
GoAccess Files, status codes and bots
o parse specific pages, e.g., page views, html, htm, php, etc. within a request
awk '$7~/\.html|\.htm|\.php/' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -a -
Or to parse a specific status code, e.g., 500 (Internal Server Error):
awk '$9~/500/' access.log | goaccess --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^' --date-format=%d/%b/%Y --time-format=%T -
GoAccess is a powerful web log analyzer and there is quite a lot that you can do with it. I hope this guide on how to use GoAccess to view and analyze Linux Web Server logs has been informative enough to get you working with GoAccess. Have fun! Below are more useful and interesting guides for you