4

JDK Java path

Set Java path permanently

				
					vim /etc/profile

# add 
export JAVA_HOME=/usr/lib/jvm/jdk-11
export PATH=$PATH:$JAVA_HOME/bin
				
			

List available jdk installed and change as required

				
					archlinux-java status

# Available Java environments:
  # java-17-openjdk
  # java-21-jdk
  # jdk-11 (default)

				
			
3-1

Block websites

  1. Install Squid Proxy:

    First, install Squid Proxy server:

    bash
    sudo pacman -Syu squid
  2. Configure Squid Proxy:

    Edit the Squid configuration file to configure URL filtering:

    bash
    sudo nano /etc/squid/squid.conf

    Add or uncomment the following lines to enable URL filtering:

    arduino
    acl block_sites dstdomain "/etc/squid/block_sites.acl" http_access deny block_sites

    Save the changes and exit the editor.

  3. Create Keyword List:

    Create a file containing keywords that you want to block:

    bash
    sudo nano /etc/squid/block_sites.acl

    Add the keywords, each on a separate line:

    arduino
    .adult .porn .explicit

    Save the changes and exit the editor.

  4. Update Permissions:

    Ensure that Squid has the necessary permissions to read the ACL file:

    bash
    sudo chown -R proxy:proxy /etc/squid/block_sites.acl
  5. Restart Squid:

    Restart the Squid service to apply the changes:

    bash
    sudo systemctl restart squid
  6. Test the Configuration:

    Test the configuration by attempting to access websites containing the specified keywords from a web browser. If the configuration is correct, access to these websites should be blocked.

To disable the keyword-based blocking, you can simply remove or comment out the relevant lines in the Squid configuration file (/etc/squid/squid.conf), restart the Squid service, and delete the keyword list file (/etc/squid/block_sites.acl).

docker Error response from daemon: Get “https://registry-1.docker.io/v2/”

				
					sudo systemctl status docker.service
sudo systemctl stop docker.socket
sudo systemctl daemon-reload
sudo systemctl restart docker.service
				
			
				
					> cat /etc/resolvconf.conf                            
# Configuration for resolvconf(8)
# See resolvconf.conf(5) for details

resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
name_servers=127.0.0.1
				
			

Functional Interfaces

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

				
					package com.brains.reactivejavademo;

import java.time.LocalDateTime;
import java.util.concurrent.Callable;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.function.Supplier;

public class T1 {

    private static Callable<String> callable(){
        return () -> "Callable: hello";
    }
    private static Runnable runnable = () -> System.out.println("Runnable: This is runnable");
    private static Supplier<String> supplier = () -> "Supplier";
    private static Consumer<String> consumer(){
        return s -> System.out.println("Consumer : "+s);
    }
    private static Function<String, String> func(){
        return (s) -> s.concat(String.valueOf(s.length()));
    }
    private static BiFunction<String, String, String> bifunc(){
        return (s,r) -> s.concat(r);
    }

    public static void main(String[] args) throws Exception {

        // callable
        Callable<String> s = callable();
        System.out.println(s.call());
        //-- Callable: hello

        // runnable
        runnable.run();
        //-- Runnable: This is runnable

        String s1 = supplier.get();
        System.out.println(s1);
        //-- Supplier

        // consumer
        consumer().accept("abc");
        //-- Consumer : abc

        String function = func().apply("hello");
        System.out.println(function);
        //-- hello5

        String bifunction = bifunc().apply("helo", "world");
        System.out.println(bifunction);
        //-- heloworld
    }
}

				
			

Creating a Local Data Lakehouse using Spark/Minio/Dremio/Nessie

https://www.linkedin.com/pulse/creating-local-data-lakehouse-using-alex-merced/

Creating a Local Data Lakehouse using Spark/Minio/Dremio/Nessie

Alex Merced

Alex Merced

Co-Author of “Apache Iceberg: The Definitive Guide” | Developer Advocate at Dremio (Data Lakehouse Evangelist) | Tech Content Creator

Data is becoming the cornerstone of modern businesses. As businesses scale, so does their data, and this leads to the need for efficient data storage, retrieval, and processing systems. This is where Data Lakehouses come into the picture.

What is a Data Lakehouse?

A Data Lakehouse combines the best of both data lakes and data warehouses. It provides data lakes’ raw storage capabilities and data warehouses’ structured querying capabilities. This hybrid model ensures flexibility in storing vast amounts of unstructured data, while also enabling SQL-based structured queries, ensuring that businesses can derive meaningful insights from their data.

No alt text provided for this image

We will be implementing a basic lakehouse locally on your laptop with the following components:

 

  • Apache Spark, which can be used for ingesting streaming or batch data into our lakehouse
  • Minio to act as our data lake repository and be the s3 compatible storage layer.
  • Apache Iceberg, as our table format, allows query engines to plan smarter, faster queries on our data lakehouse data.
  • Nessie is our data catalog allowing our datasets to be discoverable and accessible across our tools. Nessie also provides git-like features for data quality, experimentation and disaster recovery. (Get even more features like an intuitive UI and automated table optimization when you use Dremio Arctic cloud managed Nessie based catalog.)
  • Dremio will be our query engine which we can use performantly query our data but also document, organize and deliver our data to our consumers, such as data analysts doing ad-hoc analytics or building out BI dashboards.

 

Create a Docker Compose File and Running it

Building a Data Lakehouse on your local machine requires orchestrating multiple services, and Docker Compose is a perfect tool for this task. This section will create a docker-compose.yml file to define and run multi-container Docker applications for our Data Lakehouse environment.

Steps:

 

  1. Install Docker: Ensure that you have Docker installed on your laptop. If not, you can download it from the official Docker website.
  2. Create Docker Compose File:
  3. Navigate to a directory where you want to create your project. Here, create a file named docker-compose.yml. Populate this file with the following content:

 

version: "3.9"

services:
  dremio:
    platform: linux/x86_64
    image: dremio/dremio-oss:latest
    ports:
      - 9047:9047
      - 31010:31010
      - 32010:32010
    container_name: dremio

  minioserver:
    image: minio/minio
    ports:
      - 9000:9000
      - 9001:9001
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    container_name: minio
    command: server /data --console-address ":9001"

  spark_notebook:
    image: alexmerced/spark33-notebook
    ports: 
      - 8888:8888
    env_file: .env
    container_name: notebook
  
  nessie:
    image: projectnessie/nessie
    container_name: nessie
    ports:
      - "19120:19120"

networks:
  default:
    name: iceberg_env
    driver: bridge 

Create Environment File:

Alongside your docker-compose.yml file, create an .env file to manage your environment variables. Start with the given template and fill in the details as you progress.

# Fill in Details

# AWS_REGION is used by Spark
AWS_REGION=us-east-1
# This must match if using minio
MINIO_REGION=us-east-1
# Used by pyIceberg
AWS_DEFAULT_REGION=us-east-1
# AWS Credentials (this can use minio credential, to be filled in later)
AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=xxxxxxx
# If using Minio, this should be the API address of Minio Server
AWS_S3_ENDPOINT=http://minioserver:9000
# Location where files will be written when creating new tables
WAREHOUSE=s3a://warehouse/
# URI of Nessie Catalog
NESSIE_URI=http://nessie:19120/api/v1

Run the Docker Compose:

Open four terminals for clearer log visualization. Each service should have its terminal. Ensure you are in the directory containing the docker-compose.yml file, and then run:

 

  • In the first terminal:

 

docker-compose up minioserver

Headover to localhost:9001 in your browser and login with credentials minioadmin/minioadmin. Once your logged in do the following:

 

  • create a bucket called “warehouse”
  • create an access key and copy the access key and secret key to your .env file
  • In the second terminal:

 

docker-compose up nessie

 

  • In the third terminal:

 

docker-compose up spark_notebook 

In the logs when this container open look for output the looks like the following and copy and paste the URL into your browser.

notebook  |  or http://127.0.0.1:8888/?token=9db2c8a4459b4aae3132dfabdf9bf4396393c608816743a9

 

  • In the fourth terminal:

 

docker-compose up dremio 

After completing these steps, you will have Dremio, Minio, and a Spark Notebook running in separate Docker containers on your laptop, ready for further Data Lakehouse operations!

Creating a Table in Spark

From the jupyter notebook window you opened in the browser earlier, create a new notebook with the following code.

import pyspark
from pyspark.sql import SparkSession
import os


## DEFINE SENSITIVE VARIABLES
NESSIE_URI = os.environ.get("NESSIE_URI") ## Nessie Server URI
WAREHOUSE = os.environ.get("WAREHOUSE") ## BUCKET TO WRITE DATA TOO
AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY") ## AWS CREDENTIALS
AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY") ## AWS CREDENTIALS
AWS_S3_ENDPOINT= os.environ.get("AWS_S3_ENDPOINT") ## MINIO ENDPOINT


print(AWS_S3_ENDPOINT)
print(NESSIE_URI)
print(WAREHOUSE)


conf = (
    pyspark.SparkConf()
        .setAppName('app_name')
        .set('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.67.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
        .set('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions')
        .set('spark.sql.catalog.nessie', 'org.apache.iceberg.spark.SparkCatalog')
        .set('spark.sql.catalog.nessie.uri', NESSIE_URI)
        .set('spark.sql.catalog.nessie.ref', 'main')
        .set('spark.sql.catalog.nessie.authentication.type', 'NONE')
        .set('spark.sql.catalog.nessie.catalog-impl', 'org.apache.iceberg.nessie.NessieCatalog')
        .set('spark.sql.catalog.nessie.s3.endpoint', AWS_S3_ENDPOINT)
        .set('spark.sql.catalog.nessie.warehouse', WAREHOUSE)
        .set('spark.sql.catalog.nessie.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO')
        .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
        .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
)


## Start Spark Session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
print("Spark Running")


## Create a Table
spark.sql("CREATE TABLE nessie.names (name STRING) USING iceberg;").show()


## Insert Some Data
spark.sql("INSERT INTO nessie.names VALUES ('Alex Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()


## Query the Data
spark.sql("SELECT * FROM nessie.names;").show()

Run the code, creating a table you can confirm by opening up the minio dashboard.

No alt text provided for this image

Querying the Data in Dremio

Now that the data has been created, catalogs like Nessie for your Apache Iceberg tables offer easy visibility of your tables from tool to tool. We open Dremio and see we can immediately query the data we have in our Nessie catalog.

 

  • head over to localhost:9047 where Dremio web application resides
  • create your admin user account
  • once on the dashboard click on “add source” in the bottom left
  • select “nessie”

 

Fill out the first two tabs as shown in the images below:

No alt text provided for this image
No alt text provided for this image

 

  • We use the docker network URL for our Nessie server since it’s on the same docker network as our Dremio container
  • None for authentication unless you setup your Nessie server to use tokens (default is none)
  • The second tab allows us to provide credentials for the storage of the data the catalog tracks where we the keys we got from minio
  • we set the fs.s3a.path.style.access, fs.s3a.endpoint and dremio.s3.compat settings in order to use S3 compatible storage solutions like Minio
  • Since we are working with a local non-SSL network, turn off encryption on the connection.

 

Now you can see our names table on the Dremio dashboard under our new connected source.

No alt text provided for this image

You can now click on it and run a query:

No alt text provided for this image

Keep in mind Dremio also has full DML support for Apache Iceberg so you can do creates and inserts directly from Dremio like in the screenshot below.

No alt text provided for this image

So now you get the performance of Dremio and Apache Iceberg along with the git-like capabilities that the Nessie catalog brings, allowing you to isolate ingestion on branches, create zero-clones for experimentation, and rollback your catalog for disaster recovery all on your data lakehouse.

Kafka on docker-compose

Kafka setup, create topic, send message and receive messages

https://www.conduktor.io/kafka/kafka-topics-cli-tutorial/

				
					
# --------------
# docker-compose.yml
# --------------
version: '2'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - 22181:2181
  
  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 29092:29092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
# --------------

## Create topic
# -------------
kafka-topics --bootstrap-server localhost:29092 --topic firstTopi1 --create --partitions 3 --replication-factor 1
# Created topic firstTopi1.


## List topics
# -------------
kafka-topics --bootstrap-server localhost:29092 --list
# firstTopi1
# first_topic
# test_topic


## Describe topic
# -------------
kafka-topics --bootstrap-server localhost:29092 --describe --topic first_topic
# Topic: first_topic      TopicId: VwJGV8HeSXa1NjmZ5Hfvxw PartitionCount: 3       ReplicationFactor: 1    Configs: 
#         Topic: first_topic      Partition: 0    Leader: 1       Replicas: 1     Isr: 1
#         Topic: first_topic      Partition: 1    Leader: 1       Replicas: 1     Isr: 1
#         Topic: first_topic      Partition: 2    Leader: 1       Replicas: 1     Isr: 1

## Delete topic
# -------------
kafka-topics --bootstrap-server localhost:29092 --delete --topic first_topic


## Produce message
# -------------
kafka-console-producer --bootstrap-server localhost:29092 --topic test_topic
>Hello World
>abc

# Consume latest message
# -------------
kafka-console-consumer --bootstrap-server localhost:29092 --topic test_topic
# -----
Hello World
abc

# Consume messages from begining
# -------------
kafka-console-consumer --bootstrap-server localhost:29092 --topic test_topic --from-beginning
# -----
hi
Hello
Hi
How 
are
you
hi
hello
sdf
				
			

HDFS docker-compose

https://faun.pub/run-your-first-big-data-project-using-hadoop-and-docker-in-less-than-10-minutes-e1bbe2974ef3

				
					# kubernetes.txt
https://www.youtube.com/watch?v=o6bxo0Oeg6o&t=130s
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/
install-kubeadm/

Installing a container runtime
Install Docker Engine on Ubuntu
=============
1.Set up Docker's apt repository.

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

2. Install the Docker packages.

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

3. Verify that the Docker Engine installation is successful by running the hello-world image.

sudo docker run hello-world

4. Install CRID for docker
---------
4.1 Install Go
4.1.1 Download tarball 
wget https://go.dev/dl/go1.21.3.linux-amd64.tar.gz

4.1.2 untar
tar -C /usr/local -xzf go1.21.3.linux-amd64.tar.gz

4.1.3 export go path
echo 'export PATH=$PATH:/usr/local/go/bin' >>~/.profile
source ~/.profile 

4.1.4 To install, on a Linux system that uses systemd, and already has Docker Engine installed

# Clone cri-dockerd
git clone https://github.com/Mirantis/cri-dockerd.git

# with non-sudo
make cri-dockerd

# Run these commands as root

cd cri-dockerd
mkdir -p /usr/local/bin
install -o root -g root -m 0755 cri-dockerd /usr/local/bin/cri-dockerd
install packaging/systemd/* /etc/systemd/system
sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
systemctl daemon-reload
systemctl enable --now cri-docker.socket



Installing kubeadm, kubelet and kubectl
=============

1. Update the apt package index and install packages needed to use the Kubernetes apt repository:

sudo apt-get update
# apt-transport-https may be a dummy package; if so, you can skip that package
sudo apt-get install -y apt-transport-https ca-certificates curl gpg

2. Download the public signing key for the Kubernetes package repositories. 

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg


3. Add the appropriate Kubernetes apt repository
# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

4. Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Creating a cluster with kubeadm
===============

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock


Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

sudo kubeadm join 10.131.187.176:6443 --token knefgg.pdafluamsim49olo --cri-socket=unix:///var/run/cri-dockerd.sock \
        --discovery-token-ca-cert-hash sha256:b058fc69cbec62d085bb38d84f0a89879cbe16068567f061a8fac84f87eab9aa
-----

kubectl get pods -A
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   coredns-5dd5756b68-86d94               0/1     Pending   0          5m52s
kube-system   coredns-5dd5756b68-tq4v5               0/1     Pending   0          5m52s
kube-system   etcd-k8smaster-vm                      1/1     Running   0          6m5s
kube-system   kube-apiserver-k8smaster-vm            1/1     Running   0          6m8s
kube-system   kube-controller-manager-k8smaster-vm   1/1     Running   0          6m5s
kube-system   kube-proxy-kl7d8                       1/1     Running   0          5m52s
kube-system   kube-scheduler-k8smaster-vm            1/1     Running   0          6m5s

# Install flannel
https://github.com/flannel-io/flannel#deploying-flannel-manually
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.ym

# Check pods after installing flannel
 kubectl get pods -A --watch
NAMESPACE      NAME                                   READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-7m5rq                  1/1     Running   0          54s
kube-system    coredns-5dd5756b68-86d94               1/1     Running   0          7m21s
kube-system    coredns-5dd5756b68-tq4v5               1/1     Running   0          7m21s
kube-system    etcd-k8smaster-vm                      1/1     Running   0          7m34s
kube-system    kube-apiserver-k8smaster-vm            1/1     Running   0          7m37s
kube-system    kube-controller-manager-k8smaster-vm   1/1     Running   0          7m34s
kube-system    kube-proxy-kl7d8                       1/1     Running   0          7m21s
kube-system    kube-scheduler-k8smaster-vm            1/1     Running   0          7m34s

				
			
				
					# Copy file to the hadoop container
docker cp kubernetes.txt namenode:/tmp/ 

# Get inside the hadoop container
docker exec -it namenode /bin/bash

# 1.Create the root directory for this project: 
hadoop fs -mkdir /tmp

# 2.Create the directory for the input files: 
hadoop fs -mkdir /tmp/Input

# 3.Copy the input files to the HDFS: 
hadoop fs -put /tmp/kubernetes.txt /tmp/Input

# You can open UI for HDFS at 
http://localhost:9870.
				
			
				
					# spark-shell
scala> val text = sc.textFile("hdfs://localhost:9000/tmp/Input/kubernetes.txt")
text: org.apache.spark.rdd.RDD[String] = hdfs://localhost:9000/tmp/Input/kubernetes.txt MapPartitionsRDD[3] at textFile at <console>:23

scala> text.collect;
res1: Array[String] = Array(https://www.youtube.com/watch?v=o6bxo0Oeg6o&t=130s, https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/, install-kubeadm/, "", Installing a container runtime, Install Docker Engine on Ubuntu, =============, 1.Set up Docker's apt repository., "", # Add Docker's official GPG key:, sudo apt-get update, sudo apt-get install ca-certificates curl gnupg, sudo install -m 0755 -d /etc/apt/keyrings, curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg, sudo chmod a+r /etc/apt/keyrings/docker.gpg, "", # Add the repository to Apt sources:, echo \, "  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ...

scala> val counts = text.flatMap(line => line.split(" "))
counts: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at flatMap at <console>:23

scala> counts.collect;
res2: Array[String] = Array(https://www.youtube.com/watch?v=o6bxo0Oeg6o&t=130s, https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/, install-kubeadm/, "", Installing, a, container, runtime, Install, Docker, Engine, on, Ubuntu, =============, 1.Set, up, Docker's, apt, repository., "", #, Add, Docker's, official, GPG, key:, sudo, apt-get, update, sudo, apt-get, install, ca-certificates, curl, gnupg, sudo, install, -m, 0755, -d, /etc/apt/keyrings, curl, -fsSL, https://download.docker.com/linux/ubuntu/gpg, |, sudo, gpg, --dearmor, -o, /etc/apt/keyrings/docker.gpg, sudo, chmod, a+r, /etc/apt/keyrings/docker.gpg, "", #, Add, the, repository, to, Apt, sources:, echo, \, "", "", "deb, [arch="$(dpkg, --print-architecture)", signed-by=/etc/apt/keyrings...

scala> val mapf = counts.map(word => (word,1))
mapf: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[5] at map at <console>:23

scala> mapf.collect
res3: Array[(String, Int)] = Array((https://www.youtube.com/watch?v=o6bxo0Oeg6o&t=130s,1), (https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/,1), (install-kubeadm/,1), ("",1), (Installing,1), (a,1), (container,1), (runtime,1), (Install,1), (Docker,1), (Engine,1), (on,1), (Ubuntu,1), (=============,1), (1.Set,1), (up,1), (Docker's,1), (apt,1), (repository.,1), ("",1), (#,1), (Add,1), (Docker's,1), (official,1), (GPG,1), (key:,1), (sudo,1), (apt-get,1), (update,1), (sudo,1), (apt-get,1), (install,1), (ca-certificates,1), (curl,1), (gnupg,1), (sudo,1), (install,1), (-m,1), (0755,1), (-d,1), (/etc/apt/keyrings,1), (curl,1), (-fsSL,1), (https://download.docker.com/linux/ubuntu/gpg,1), (|,1), (sudo,1), (gpg,1), (--dearmor,1), (-o,1), (/etc/apt/ke...

scala> val reducef = mapf.reduceByKey(_+_);
reducef: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:23

scala> reducef.collect
res4: Array[(String, Int)] = Array((package,4), (index,1), (cluster.,1), (kube-scheduler-k8smaster-vm,2), ("$(.,1), (-e,1), (/',1), (/etc/kubernetes/admin.conf,1), (/etc/os-release,1), (This,1), (repository.,1), ([signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg],1), (RESTARTS,2), (kube-flannel,1), (kube-apiserver-k8smaster-vm,2), (daemon-reload,1), (export,2), (gpg,3), (already,1), (any,2), (go,1), (make,1), (network,1), (Download,2), (git,1), (control-plane,1), (4.,2), (packaging/systemd/*,1), (-o,3), (are,1), ("kubectl,1), (2.,2), (sha256:b058fc69cbec62d085bb38d84f0a89879cbe16068567f061a8fac84f87eab9aa,1), ([podnetwork].yaml",1), (https://download.docker.com/linux/ubuntu/gpg,1), (STATUS,2), (kubelet,3), (overwrites,1), (commands,1), (can,3), (tee,2), (...


				
			

Docker MongoDB – csv import

Ref: https://www.mongodb.com/developer/products/mongodb/mongoimport-guide/
				
					# Docker compose for mongodb.
# Create a volume to place the csv file in it

version: "2.0"
services:
  mongodb:
    image: mongo:4.4.2
    restart: always
    mem_limit: 512m
    volumes:
      - ./mongodata:/data/db
    ports:
      - "27019:27017"
    command: mongod
    # environment:
    #   - MONGO_INITDB_ROOT_USERNAME=root
    #   - MONGO_INITDB_ROOT_PASSWORD=password
    healthcheck:
      test: "mongo --eval 'db.stats().ok'"
      interval: 5s
      timeout: 2s
      retries: 60
				
			
				
					# Connect to mongodb docker container
docker exec -it 79e80b878fbc bash

root@ > mongoimport \
   --collection='fields_option' \
   --file=/data/db/events.csv \
   --type=csv \
   --fields="timestamp","visitorid","event","itemid","transactionid"
				
			

Manjaro update failed due to clashes in dependencies

https://forum.manjaro.org/t/pamac-update-problem-jre-jdk/151478/6

Commands suggested in Known information and solutions on forum announcements for this issue don’t work, and users don’t seem to be capable of reading further into the topic to find working solutions