smhk

Testing Vector and Clickhouse

I have a basic Python program which logs to journald.

Using a collector, such as OpenTelemetry Collector or Vector, I want to get these Python logs into ClickHouse.

Following are notes on trying this out with a VM.

Installing Vector §

Following along with the official instructions.

Add the repo:

bash -c "$(curl -L https://setup.vector.dev)"

Install Vector:

sudo apt-get install vector

Start Vector:

sudo systemctl start vector

Check its status:

systemctl status vector

Installing Clickhouse §

Following along with the official instructions.

Add the repo:

sudo apt-get install -y apt-transport-https ca-certificates dirmngr
GNUPGHOME=$(mktemp -d)
sudo GNUPGHOME="$GNUPGHOME" gpg --no-default-keyring --keyring /usr/share/keyrings/clickhouse-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 8919F6BD2B48D754
sudo rm -rf "$GNUPGHOME"
sudo chmod +r /usr/share/keyrings/clickhouse-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

Install the server and client:

sudo apt-get install -y clickhouse-server clickhouse-client

When prompted, leave the password blank (since we are just testing).

Ignore clickhouse-keeper for testing purposes. Though for production, ideally it should be used.

Start the server and check its status:

sudo systemctl start clickhouse-server
systemctl status clickhouse-server

Start the client:

clickhouse-client # or "clickhouse-client --password" if you set up a password.

This launches a CLI. We can run SELECT 1 to do check it works:

$ clickhouse-client
ClickHouse client version 23.11.1.2711 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.11.1.

Warnings:
 * Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check /proc/sys/kernel/task_delayacct
 * Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.

sam-virtual-machine :) 
sam-virtual-machine :) SELECT 1

SELECT 1

Query id: 8ed747e7-4678-468e-9237-76d6c94f9376

┌─1─┐
│ 1 │
└───┘

1 row in set. Elapsed: 0.001 sec. 

sam-virtual-machine :) 

Configuring Vector §

By default, Vector reads the configuration file at /etc/vector/vector.yaml. Upon installation, this configuration file is filled with the following example setup:

#                                    __   __  __
#                                    \ \ / / / /
#                                     \ V / / /
#                                      \_/  \/
#
#                                    V E C T O R
#                                   Configuration
#
# ------------------------------------------------------------------------------
# Website: https://vector.dev
# Docs: https://vector.dev/docs
# Chat: https://chat.vector.dev
# ------------------------------------------------------------------------------

# Change this to use a non-default directory for Vector data storage:
# data_dir: "/var/lib/vector"

# Random Syslog-formatted logs
sources:
  dummy_logs:
    type: "demo_logs"
    format: "syslog"
    interval: 1

# Parse Syslog logs
# See the Vector Remap Language reference for more info: https://vrl.dev
transforms:
  parse_logs:
    type: "remap"
    inputs: ["dummy_logs"]
    source: |
      . = parse_syslog!(string!(.message))      

# Print parsed logs to stdout
sinks:
  print:
    type: "console"
    inputs: ["parse_logs"]
    encoding:
      codec: "json"

# Vector's GraphQL API (disabled by default)
# Uncomment to try it out with the `vector top` command or
# in your browser at http://localhost:8686
# api:
#   enabled: true
#   address: "127.0.0.1:8686"

This defines a dummy_logs source, which uses demo_logs as the input. The demo_logs input creates some example logs.

Then after a basic transform,a print sink is defined, which uses console as the output.

To view the most recent of these example logs in JSON format, I can do journalctl -u vector -o json-pretty:

{
        "_SYSTEMD_CGROUP" : "/system.slice/vector.service",
        "_STREAM_ID" : "7fdb5aae31cd4b2f8dc50861f842c15f",
        "_CMDLINE" : "/usr/bin/vector",
        "_COMM" : "vector",
        "_TRANSPORT" : "stdout",
        "_EXE" : "/usr/bin/vector",
        "_PID" : "3650",
        "_SYSTEMD_INVOCATION_ID" : "b43031f6aed041988be6770557354eda",
        "SYSLOG_IDENTIFIER" : "vector",
        "_SYSTEMD_UNIT" : "vector.service",
        "__CURSOR" : "s=6f5532dee89c46d49412e3998509ba23;i=32a8;b=481ac9fbbbbe4467832abd1d1bcf112f;m=14037d15a;t=60be95db9b652;x=dee5533644434954",
        "MESSAGE" : "{\"appname\":\"KarimMove\",\"facility\":\"authpriv\",\"hostname\":\"random.et\",\"message\":\"We're gonna need a bigger boat\",\"msgid\":\"ID127\",\"procid\":787,\"severity\":\"warning\",\"timestamp\":\"2023-12-07T10:59:47.994Z\",\"version\":2}",
        "_BOOT_ID" : "481ac9fbbbbe4467832abd1d1bcf112f",
        "_MACHINE_ID" : "f03827c65a884b6885793c418d4fb8ae",
        "_UID" : "999",
        "_SYSTEMD_SLICE" : "system.slice",
        "__REALTIME_TIMESTAMP" : "1701946788001362",
        "SYSLOG_FACILITY" : "3",
        "__MONOTONIC_TIMESTAMP" : "5372367194",
        "_HOSTNAME" : "sam-virtual-machine",
        "PRIORITY" : "6",
        "_CAP_EFFECTIVE" : "400",
        "_GID" : "999",
        "_SELINUX_CONTEXT" : "unconfined\n"
}

Let’s try and change the configuration to read journald logs and send them to the console.

First let’s backup the default file to vector.default.yaml:

sudo cp /etc/vector/vector{,.default}.yaml

Now we can edit vector.yaml as we please:

sudo nano /etc/vector/vector.yaml
# Source logs from journald
sources:
  journald_logs:
    type: "journald"

# Sink journald logs to console
sinks:
  print:
    type: "console"
    inputs: ["journald_logs"]
    encoding:
      codec: "json"

I immediately realised this was a bad idea.

Since the console output from Vector is being captured by journald, and Vector is sourcing logs from journald, this creates a loop: Vector will read logs from journald, write them to the console, systemd will capture the console output and write it to journald, which gets picked up by Vector.

I decided to try it anyway. I let it run for somewhere under a minute, then captured the logs:

Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\">
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: {"PRIORITY":"6","SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"vector","_BOOT_ID":"481ac9fbbbbe446783>
Dec 07 11:18:07 sam-virtual-machine vector[11784]: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\>
Dec 07 11:18:07 sam-virtual-machine systemd[1]: Stopping Vector...

An interesting consequence of this loop is that the use of \ for escaping compounds, leading to a horrendous amount of \\\\\\\\.... Note that due to the horizontal scrolling in journalctl, you can only see the beginning of the bac


To use ClickHouse as a sink in Vector, we first need to ensure that ClickHouse has a database and a table with the correct schema. The ClickHouse example integration with Vector shows nginx logs being placed in a message string field, and then creating a “materialized view” to that splits the message field into separate fields (RemoteAddr, Client, RemoteUser, etc.).

However, our journald logs are already structured, so how can we pass that into ClickHouse? Our journald logs appear to consist of a log field with a JSON body that consists of many fields. Should each of these be a field in the ClickHouse table?

Let’s manually create vector_db database, with a journald_logs table, with a log string field:

$ clickhouse-client
ClickHouse client version 23.11.1.2711 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.11.1.

Warnings:
 * Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check /proc/sys/kernel/task_delayacct
 * Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.

sam-virtual-machine :) CREATE DATABASE IF NOT EXISTS vector_db

CREATE DATABASE IF NOT EXISTS vector_db

Query id: 0cebc227-89df-429b-a26a-d2bf2d4fa326

Ok.

0 rows in set. Elapsed: 0.013 sec. 

sam-virtual-machine :) CREATE TABLE IF NOT EXISTS vector_db.journald_logs (
    log String
)
ENGINE = MergeTree()
ORDER BY tuple()

CREATE TABLE IF NOT EXISTS vector_db.journald_logs
(
    `log` String
)
ENGINE = MergeTree
ORDER BY tuple()

Query id: c96a60e8-589b-4423-a8ab-497289788490

Ok.

0 rows in set. Elapsed: 0.004 sec. 

sam-virtual-machine :) 

Let’s check that worked:

sam-virtual-machine :) use vector_db;

USE vector_db

Query id: df5592da-4a5b-4ff5-b214-4da2568defa1

Ok.

0 rows in set. Elapsed: 0.001 sec. 

sam-virtual-machine :) show tables;

SHOW TABLES

Query id: cf49cf52-b023-405f-8eb1-c64d627b2526

┌─name──────────┐
│ journald_logs │
└───────────────┘

1 row in set. Elapsed: 0.002 sec. 

sam-virtual-machine :) select * from journald_logs

SELECT *
FROM journald_logs

Query id: 795ec50a-01a0-4013-b3c4-f43b5752aee2

Ok.

0 rows in set. Elapsed: 0.001 sec. 

Let’s configure Vector to send to ClickHouse:

# Source logs from journald
sources:
  journald_logs:
    type: "journald"

# Send logs to ClickHouse
sinks:
  clickhouse_logs:
    type: "clickhouse"
    inputs: ["journald_logs"]
    endpoint: "http://localhost:8123"
    database: "vector_db"
    table: "journald_logs"

Now we can start Vector again:

sudo systemctl start vector

Let’s see if the logs made it to ClickHouse:

sam-virtual-machine :) select count(*) from journald_logs;

SELECT count(*)
FROM journald_logs

Query id: c3d8f0cb-6f50-49a9-ab3e-db9ae0525333

┌─count()─┐
│      31 │
└─────────┘

1 row in set. Elapsed: 0.002 sec. 

sam-virtual-machine :) select * from journald_logs;

SELECT *
FROM journald_logs

Query id: 097be11b-45b7-43f1-b92d-ba014d95e418

┌─log─┐
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
│     │
└─────┘

31 rows in set. Elapsed: 0.001 sec. 

sam-virtual-machine :) 

We now have rows, but they are empty. Presumably this is because I got the field name wrong?

Let’s try renaming the log field to message, because that is what it used in the nginx journald example:

sam-virtual-machine :) alter table journald_logs rename column log to message

ALTER TABLE journald_logs
    RENAME COLUMN log TO message

Query id: dae295ba-af35-4e8b-afe0-d3cb7e7ada98

Ok.

0 rows in set. Elapsed: 0.007 sec. 

sam-virtual-machine :) 

Now after restarting Vector:

sam-virtual-machine :) select * from journald_logs

SELECT *
FROM journald_logs

Query id: b7adcd50-e016-4f99-9320-e5931555dbca

┌─message─┐
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
│         │
└─────────┘
┌─message─┐
│         │
└─────────┘
┌─message─┐
│         │
│         │
└─────────┘
┌─message───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│      sam : TTY=pts/0 ; PWD=/home/sam ; USER=root ; COMMAND=/usr/bin/systemctl stop vector                                                                                                     │
│ pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)                                                                                                                     │
│ Stopping Vector...                                                                                                                                                                            │
│ 2023-12-07T13:58:24.715772Z  INFO vector::signal: Signal received. signal="SIGTERM"                                                                                                           │
│ 2023-12-07T13:58:24.715915Z  INFO vector: Vector has stopped.                                                                                                                                 │
│ 2023-12-07T13:58:24.717718Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="clickhouse_logs" time_remaining="59 seconds left"          │
│ vector.service: Deactivated successfully.                                                                                                                                                     │
│ Stopped Vector.                                                                                                                                                                               │
│ pam_unix(sudo:session): session closed for user root                                                                                                                                          │
│ <info>  [1701957623.3249] dhcp4 (ens33): state changed new lease, address=192.168.159.133                                                                                                     │
│      sam : TTY=pts/0 ; PWD=/home/sam ; USER=root ; COMMAND=/usr/bin/systemctl restart vector                                                                                                  │
│ pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)                                                                                                                     │
│ Starting Vector...                                                                                                                                                                            │
│ √ Loaded ["/etc/vector/vector.yaml"]                                                                                                                                                          │
│ √ Component configuration                                                                                                                                                                     │
│ √ Health check "clickhouse_logs"                                                                                                                                                              │
│ ------------------------------------                                                                                                                                                          │
│                            Validated                                                                                                                                                          │
│ Started Vector.                                                                                                                                                                               │
│ 2023-12-07T14:02:26.883625Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info" │
│ 2023-12-07T14:02:26.884010Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]                                                                                             │
│ pam_unix(sudo:session): session closed for user root                                                                                                                                          │
│ 2023-12-07T14:02:26.927182Z  INFO vector::topology::running: Running healthchecks.                                                                                                            │
│ 2023-12-07T14:02:26.927249Z  INFO vector: Vector has started. debug="false" version="0.34.1" arch="x86_64" revision="86f1c22 2023-11-16 14:59:10.486846964"                                   │
│ 2023-12-07T14:02:26.927259Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.                                                 │
│ 2023-12-07T14:02:26.929504Z  INFO vector::topology::builder: Healthcheck passed.                                                                                                              │
│ 2023-12-07T14:02:26.930004Z  INFO source{component_kind="source" component_id=journald_logs component_type=journald}: vector::sources::journald: Starting journalctl.                         │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

61 rows in set. Elapsed: 0.004 sec. 

sam-virtual-machine :) 

That worked, sort of. It appears it takes the plain text output of journalctl rather than the structured output.

Getting the strutured output §

Looking at the Vector journald source documentation, it appears that the message field is just the raw, plan-text message. We need to specify the other fields to get the structured data.

First, let’s enable log namespacing, since it sounds like this will ensure that our journald fields do not clash with the Vector fields, and it will become the default in future:

schema:
  log_namespace: true

Now let’s use our existing dummy Python systemd service, and make it log to journald:

service.py §
# ...

def main():
    logging_setup(renderer="journald")

    logger.info("starting up")
    try:
        while True:
            t = time.time()
            logger.info("example message", my_counter=t)
            time.sleep(1)
    except SystemExit:
        logger.info("shutting down")

Let’s configure Vector to only fetch journald logs from this service:

# Source logs from journald
sources:
  journald_logs:
    type: "journald"
    include_units: ["py_sysd_pkg.service"]

And let’s configure ClickHouse to add a MY_COUNTER field, so that it can receive the my_counter field which is logged in the Python service:

sam-virtual-machine :) alter table journald_logs add column MY_COUNTER UInt32;

ALTER TABLE journald_logs
    ADD COLUMN `MY_COUNTER` UInt32

Query id: 73271d77-dde5-47ae-ac12-06fdaa0ed03d

Ok.

0 rows in set. Elapsed: 0.003 sec. 

That’s not working.

Let’s disable log namespacing.

Still not working.

Let’s try outputting to console instead:

# Print parsed logs to stdout
sinks:
  print:
    type: "console"
    inputs: ["journald_logs"]
    encoding:
      codec: "json"
$ journalctl -u vector -f
...
Dec 07 14:42:40 sam-virtual-machine vector[20119]: {"CODE_FILE":"/home/sam/projects/P2515/use_systemd/py_sysd_pkg/journald_logger.py","CODE_FUNC":"msg","CODE_LINE":"6","LEVEL":"info","MY_COUNTER":"1701960160.2507374","SYSLOG_IDENTIFIER":"python","TIMESTAMP":"2023-12-07T14:42:40.250791Z","_BOOT_ID":"481ac9fbbbbe4467832abd1d1bcf112f","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/python /home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/service-run","_COMM":"service-run","_EXE":"/home/sam/.pyenv/versions/3.10.12/bin/python3.10","_GID":"0","_MACHINE_ID":"f03827c65a884b6885793c418d4fb8ae","_PID":"19190","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1701960160250948","_SYSTEMD_CGROUP":"/system.slice/foo.service","_SYSTEMD_INVOCATION_ID":"8c51f95ab14d47bdb024b00ca651c629","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_UNIT":"foo.service","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"18744616807","__REALTIME_TIMESTAMP":"1701960160250975","host":"sam-virtual-machine","message":"example message","source_type":"journald","timestamp":"2023-12-07T14:42:40.250948Z"}
Dec 07 14:42:41 sam-virtual-machine vector[20119]: {"CODE_FILE":"/home/sam/projects/P2515/use_systemd/py_sysd_pkg/journald_logger.py","CODE_FUNC":"msg","CODE_LINE":"6","LEVEL":"info","MY_COUNTER":"1701960161.2529314","SYSLOG_IDENTIFIER":"python","TIMESTAMP":"2023-12-07T14:42:41.252981Z","_BOOT_ID":"481ac9fbbbbe4467832abd1d1bcf112f","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/python /home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/service-run","_COMM":"service-run","_EXE":"/home/sam/.pyenv/versions/3.10.12/bin/python3.10","_GID":"0","_MACHINE_ID":"f03827c65a884b6885793c418d4fb8ae","_PID":"19190","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1701960161253126","_SYSTEMD_CGROUP":"/system.slice/foo.service","_SYSTEMD_INVOCATION_ID":"8c51f95ab14d47bdb024b00ca651c629","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_UNIT":"foo.service","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"18745618983","__REALTIME_TIMESTAMP":"1701960161253150","host":"sam-virtual-machine","message":"example message","source_type":"journald","timestamp":"2023-12-07T14:42:41.253126Z"}
Dec 07 14:42:42 sam-virtual-machine vector[20119]: {"CODE_FILE":"/home/sam/projects/P2515/use_systemd/py_sysd_pkg/journald_logger.py","CODE_FUNC":"msg","CODE_LINE":"6","LEVEL":"info","MY_COUNTER":"1701960162.2548127","SYSLOG_IDENTIFIER":"python","TIMESTAMP":"2023-12-07T14:42:42.254875Z","_BOOT_ID":"481ac9fbbbbe4467832abd1d1bcf112f","_CAP_EFFECTIVE":"1ffffffffff","_CMDLINE":"/home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/python /home/sam/.cache/pypoetry/virtualenvs/py-sysd-pkg-0J3MkRb1-py3.10/bin/service-run","_COMM":"service-run","_EXE":"/home/sam/.pyenv/versions/3.10.12/bin/python3.10","_GID":"0","_MACHINE_ID":"f03827c65a884b6885793c418d4fb8ae","_PID":"19190","_SELINUX_CONTEXT":"unconfined\n","_SOURCE_REALTIME_TIMESTAMP":"1701960162255066","_SYSTEMD_CGROUP":"/system.slice/foo.service","_SYSTEMD_INVOCATION_ID":"8c51f95ab14d47bdb024b00ca651c629","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_UNIT":"foo.service","_TRANSPORT":"journal","_UID":"0","__MONOTONIC_TIMESTAMP":"18746620932","__REALTIME_TIMESTAMP":"1701960162255100","host":"sam-virtual-machine","message":"example message","source_type":"journald","timestamp":"2023-12-07T14:42:42.255066Z"}

Looks like it is coming out of the console correctly.

Let’s re-enable namespacing:

$ journalctl -u vector -f
...
Dec 07 14:44:43 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:44 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:45 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:46 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:47 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:48 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:49 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:50 sam-virtual-machine vector[20811]: "example message"
Dec 07 14:44:51 sam-virtual-machine vector[20811]: "example message"

Interesting, now we only get the message field output to console.

Let’s disable namepsacing.

We get these errors in the Vector journald log:

Dec 07 15:25:49 sam-virtual-machine vector[30421]: 2023-12-07T15:25:49.403924Z ERROR sink{component_kind="sink" component_id=clickhouse_logs component_type=clickhouse}:request{request_id=1}: vector::sinks::util::retries: Not retriable; dropping the request. reason="response status: 400 Bad Request" internal_log_rate_limit=true
Dec 07 15:25:49 sam-virtual-machine vector[30421]: 2023-12-07T15:25:49.403974Z ERROR sink{component_kind="sink" component_id=clickhouse_logs component_type=clickhouse}:request{request_id=1}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=None request_id=1 error_type="request_failed" stage="sending" internal_log_rate_limit=true
Dec 07 15:25:49 sam-virtual-machine vector[30421]: 2023-12-07T15:25:49.403987Z ERROR sink{component_kind="sink" component_id=clickhouse_logs component_type=clickhouse}:request{request_id=1}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Service call failed. No retries or retries exhausted." internal_log_rate_limit=true
Dec 07 15:25:50 sam-virtual-machine vector[30421]: 2023-12-07T15:25:50.426389Z ERROR sink{component_kind="sink" component_id=clickhouse_logs component_type=clickhouse}:request{request_id=2}: vector::sinks::util::retries: Internal log [Not retriable; dropping the request.] is being suppressed to avoid flooding.

Now let’s modify the MY_COUNTER column from UInt32 to String:

sam-virtual-machine :) describe journald_logs;

DESCRIBE TABLE journald_logs

Query id: bc762c83-ea61-48b4-9cec-073ad0b56eed

┌─name───────┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ message    │ String │              │                    │         │                  │                │
│ MY_COUNTER │ UInt32 │              │                    │         │                  │                │
└────────────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

2 rows in set. Elapsed: 0.002 sec. 

sam-virtual-machine :) alter table journald_logs modify column MY_COUNTER String;

ALTER TABLE journald_logs
    MODIFY COLUMN `MY_COUNTER` String

Query id: fd049e22-83ea-43a7-8574-da361c619bd6

Ok.

0 rows in set. Elapsed: 0.010 sec. 

sam-virtual-machine :) describe journald_logs;

DESCRIBE TABLE journald_logs

Query id: 1634572a-3439-4e8e-8b2b-5e1904deaa8a

┌─name───────┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ message    │ String │              │                    │         │                  │                │
│ MY_COUNTER │ String │              │                    │         │                  │                │
└────────────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

2 rows in set. Elapsed: 0.001 sec. 

That works:

sam-virtual-machine :) select * from journald_logs;

SELECT *
FROM journald_logs

Query id: 506ded28-44cf-4aef-bb9b-30f300305cba

┌─message─────────┬─MY_COUNTER─────────┐
│ example message │ 0                  │
│ example message │ 0                  │
│ example message │ 0                  │
│ example message │ 0                  │
│ example message │ 0                  │
...
│ example message │ 1701963307.1571245 │
│ example message │ 1701963308.1594532 │
│ example message │ 1701963309.161399  │
│ example message │ 1701963310.1629558 │
│ example message │ 1701963311.1647515 │
│ example message │ 1701963312.1659102 │
│ example message │ 1701963313.1681535 │
│ example message │ 1701963314.170539  │
│ example message │ 1701963315.172969  │
│ example message │ 1701963316.1759694 │
│ example message │ 1701963317.1779773 │
│ example message │ 1701963318.1795716 │
└─────────────────┴────────────────────┘

645 rows in set. Elapsed: 0.003 sec. 

So ClickHouse was rejecting the value for MY_COUNTER because it could not fit in a UInt32 field. So it looks like I’ll need to manually create each field and ensure it has the right type.

This was my final Vector configuration:

sources:
  journald_logs:
    type: "journald"
    include_units: ["py_sysd_pkg.service"]

sinks:
  clickhouse_logs:
    type: "clickhouse"
    inputs: ["journald_logs"]
    endpoint: "http://localhost:8123"
    database: "vector_db"
    table: "journald_logs"
    encoding:
      only_fields: ["message", "MY_COUNTER"]

OpenTelemetry Collector §

Next, let’s try OpenTelemetry Collector instead of Vector.

While OpenTelemetry Collector has a journald receiver, it is currently undocumented. However, there is an extensive README.

Here is an example of exporting to ClickHouse from OpenTelemetry Collector. Note that it creates the database and table if they do not already exist.

Let’s install the collector:

sudo apt-get update
sudo apt-get -y install wget systemctl
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.90.1/otelcol_0.90.1_linux_amd64.deb
sudo dpkg -i otelcol_0.90.1_linux_amd64.deb

And start it:

sudo systemctl start otelcol

Check it is running:

$ systemctl status otelcol
● otelcol.service - OpenTelemetry Collector
     Loaded: loaded (/lib/systemd/system/otelcol.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-12-07 15:41:22 GMT; 27s ago
   Main PID: 33819 (otelcol)
      Tasks: 7 (limit: 4578)
     Memory: 14.2M
        CPU: 105ms
     CGroup: /system.slice/otelcol.service
             └─33819 /usr/bin/otelcol --config=/etc/otelcol/config.yaml

Dec 07 15:41:44 sam-virtual-machine otelcol[33819]: Descriptor:
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]:      -> Name: up
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]:      -> Description: The scraping was successful
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]:      -> Unit:
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]:      -> DataType: Gauge
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]: NumberDataPoints #0
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]: StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]: Timestamp: 2023-12-07 15:41:44.366 +0000 UTC
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]: Value: 1.000000
Dec 07 15:41:44 sam-virtual-machine otelcol[33819]:         {"kind": "exporter", "data_type": "metrics", "name": "logging"}

Take a look at the default configuration:

less /etc/otelcol/config.yaml
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attac>

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  logging:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [logging]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [logging]

  extensions: [health_check, pprof, zpages]

Back up the default:

sudo cp /etc/otelcol/config{,.default}.yaml

Edit the config:

sudo nano /etc/otelcol/config.yaml

Let’s then try a simple configuration that receives from the py_sysd_pkg.service in journald and emits that to the console:

receivers:
  journald:
    directory: /run/log/journal
    units:
      - py_sysd_pkg
    priority: info

exporters:
  logging:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [journald]
      exporters: [logging]

Using this configuration gives this error in journald:

Dec 07 15:54:04 sam-virtual-machine otelcol[35148]: Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:
Dec 07 15:54:04 sam-virtual-machine otelcol[35148]: * error decoding 'receivers': unknown type: "journald" for id: "journald" (valid values: [hostmetrics jaeger kafka opencensus prometheus zipkin otlp])
Dec 07 15:54:04 sam-virtual-machine otelcol[35148]: 2023/12/07 15:54:04 collector server run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:
Dec 07 15:54:04 sam-virtual-machine otelcol[35148]: * error decoding 'receivers': unknown type: "journald" for id: "journald" (valid values: [hostmetrics jaeger kafka opencensus prometheus zipkin otlp])
Dec 07 15:54:04 sam-virtual-machine systemd[1]: otelcol.service: Main process exited, code=exited, status=1/FAILURE

It looks like this is because journald support is in the contrib repository, which is not included by default.

To use the contrib repository, we need to use the OpenTelemetry Collector Builder (ocb), which:

generates a custom OpenTelemetry Collector binary based on a given configuration

To install the builder, you can either download the builder binary, or install Go and compile it from source. I’m going to use the binary.

I’ll begin by clicking “Assets” to open up the list of ocb* binaries, and downloading ocb_0.90.1_linux_amd64 on my Ubuntu 22.04 VM.

Then I’ll give it execute permissions and rename it to ocb to match the documentation:

$ chmod +x ocb_0.90.1_linux_amd64 
$ mv ocb_0.90.1_linux_amd64 ocb

Before running it, we need to create a configuration file. Let’s start with the default configuration file, and add support for ClickHouse and journald:

builder-config.yaml §
dist:
  module: go.opentelemetry.io/collector/cmd/otelcorecol
  name: otelcorecol
  description: Local OpenTelemetry Collector binary, testing only.
  version: 0.90.1-dev
  otelcol_version: 0.90.1

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.90.1
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/journaldreceiver v0.90.1
exporters:
  - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.90.1
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter v0.90.1
extensions:
  - gomod: go.opentelemetry.io/collector/extension/ballastextension v0.90.1
  - gomod: go.opentelemetry.io/collector/extension/zpagesextension v0.90.1
processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.90.1
  - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.90.1
connectors:
  - gomod: go.opentelemetry.io/collector/connector/forwardconnector v0.90.1

replaces:
  - go.opentelemetry.io/collector => ../../
  - go.opentelemetry.io/collector/otelcol => ../../otelcol
  - go.opentelemetry.io/collector/component => ../../component
  - go.opentelemetry.io/collector/config/configauth => ../../config/configauth
  - go.opentelemetry.io/collector/config/configcompression => ../../config/configcompression
  - go.opentelemetry.io/collector/config/configgrpc => ../../config/configgrpc
  - go.opentelemetry.io/collector/config/confighttp => ../../config/confighttp
  - go.opentelemetry.io/collector/config/confignet => ../../config/confignet
  - go.opentelemetry.io/collector/config/configopaque => ../../config/configopaque
  - go.opentelemetry.io/collector/config/configtelemetry => ../../config/configtelemetry
  - go.opentelemetry.io/collector/config/configtls => ../../config/configtls
  - go.opentelemetry.io/collector/config/internal => ../../config/internal
  - go.opentelemetry.io/collector/confmap => ../../confmap
  - go.opentelemetry.io/collector/consumer => ../../consumer
  - go.opentelemetry.io/collector/connector => ../../connector
  - go.opentelemetry.io/collector/connector/forwardconnector => ../../connector/forwardconnector
  - go.opentelemetry.io/collector/exporter => ../../exporter
  - go.opentelemetry.io/collector/exporter/debugexporter => ../../exporter/debugexporter
  - go.opentelemetry.io/collector/exporter/loggingexporter => ../../exporter/loggingexporter
  - go.opentelemetry.io/collector/exporter/otlpexporter => ../../exporter/otlpexporter
  - go.opentelemetry.io/collector/exporter/otlphttpexporter => ../../exporter/otlphttpexporter
  - go.opentelemetry.io/collector/extension => ../../extension
  - go.opentelemetry.io/collector/extension/auth => ../../extension/auth
  - go.opentelemetry.io/collector/extension/ballastextension => ../../extension/ballastextension
  - go.opentelemetry.io/collector/extension/zpagesextension => ../../extension/zpagesextension
  - go.opentelemetry.io/collector/featuregate => ../../featuregate
  - go.opentelemetry.io/collector/pdata => ../../pdata
  - go.opentelemetry.io/collector/processor => ../../processor
  - go.opentelemetry.io/collector/receiver => ../../receiver
  - go.opentelemetry.io/collector/receiver/otlpreceiver => ../../receiver/otlpreceiver
  - go.opentelemetry.io/collector/processor/batchprocessor => ../../processor/batchprocessor
  - go.opentelemetry.io/collector/processor/memorylimiterprocessor => ../../processor/memorylimiterprocessor
  - go.opentelemetry.io/collector/semconv => ../../semconv
  - go.opentelemetry.io/collector/service => ../../service

Now let’s try and build:

$ ./ocb --config=builder-config.yaml
2023-12-07T16:37:17.616Z  INFO  internal/command.go:123 OpenTelemetry Collector Builder {"version": "0.90.1", "date": "2023-12-01T18:48:16Z"}
2023-12-07T16:37:17.617Z  INFO  internal/command.go:159 Using config file {"path": "builder-config.yaml"}
Error: go not found: go binary not found

Looks like we need Go:

$ sudo apt-get install golang

Failure:

$ ./ocb --config=builder-config.yaml --verbose
2023-12-07T16:54:33.726Z  INFO  internal/command.go:123 OpenTelemetry Collector Builder {"version": "0.90.1", "date": "2023-12-01T18:48:16Z"}
2023-12-07T16:54:33.727Z  INFO  internal/command.go:159 Using config file {"path": "builder-config.yaml"}
2023-12-07T16:54:33.727Z  INFO  builder/config.go:109 Using go  {"go-executable": "/usr/bin/go"}
2023-12-07T16:54:33.728Z  INFO  builder/main.go:91  Sources created {"path": "/tmp/otelcol-distribution220086697"}
2023-12-07T16:54:33.728Z  INFO  builder/main.go:25  Running go subcommand.  {"arguments": ["get", "cloud.google.com/go"]}
2023-12-07T16:54:33.737Z  INFO  zapio/writer.go:146 go: go.opentelemetry.io/collector/extension/ballastextension@v0.90.1 (replaced by ../../extension/ballastextension): reading /extension/ballastextension/go.mod: open /extension/ballastextension/go.mod: no such file or directory
Error: failed to go get: exit status 1

In an effort to fix the above error, I tried commenting out ballastextension, but then the same error occurred for another component. Disabled that and the pattern continued. The error is always replaced by <blah>, which makes me suspicious of the replaces: section of the yaml file. So let’s try a simplified version of the yaml file:

dist:
  module: go.opentelemetry.io/collector/cmd/otelcorecol
  name: otelcorecol
  description: Local OpenTelemetry Collector binary, testing only.
  version: 0.90.1-dev
  otelcol_version: 0.90.1

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.90.1
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/journaldreceiver v0.90.1
exporters:
  - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.90.1
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.90.1
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter v0.90.1
extensions:
  - gomod: go.opentelemetry.io/collector/extension/ballastextension v0.90.1
  - gomod: go.opentelemetry.io/collector/extension/zpagesextension v0.90.1
processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.90.1
  - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.90.1
connectors:
  - gomod: go.opentelemetry.io/collector/connector/forwardconnector v0.90.1

# All "replaces" are commented out.
# replaces:
# ...

Now we get further:

$ ./ocb --config=builder-config.yaml --verbose
2023-12-07T16:57:27.071Z  INFO  internal/command.go:123 OpenTelemetry Collector Builder {"version": "0.90.1", "date": "2023-12-01T18:48:16Z"}
2023-12-07T16:57:27.072Z  INFO  internal/command.go:159 Using config file {"path": "builder-config.yaml"}
2023-12-07T16:57:27.072Z  INFO  builder/config.go:109 Using go  {"go-executable": "/usr/bin/go"}
2023-12-07T16:57:27.073Z  INFO  builder/main.go:91  Sources created {"path": "/tmp/otelcol-distribution3059435558"}
2023-12-07T16:57:27.073Z  INFO  builder/main.go:25  Running go subcommand.  {"arguments": ["get", "cloud.google.com/go"]}
2023-12-07T16:57:27.921Z  INFO  zapio/writer.go:146 go: downloading cloud.google.com/go v0.111.0
2023-12-07T16:57:28.060Z  INFO  zapio/writer.go:146 go: added cloud.google.com/go v0.111.0
2023-12-07T16:57:28.061Z  INFO  builder/main.go:25  Running go subcommand.  {"arguments": ["mod", "tidy", "-compat=1.20"]}
2023-12-07T16:57:28.067Z  INFO  zapio/writer.go:146 invalid value "1.20" for flag -compat: maximum supported Go version is 1.18
2023-12-07T16:57:28.067Z  INFO  zapio/writer.go:146 usage: go mod tidy [-e] [-v] [-go=version] [-compat=version]
2023-12-07T16:57:28.067Z  INFO  zapio/writer.go:146 Run 'go help mod tidy' for details.
Error: failed to update go.mod: exit status 2

But seemingly fail because something is requiring Go v1.20. We have Go v1.18 installed:

$ go version
go version go1.18.1 linux/amd64