Blog

  • hsfuck

    hsfuck

    Tests CI Build CI License

    Logo

    A brainfuck compiler written in Haskell

    Tech stack

    • Languages: Haskell
    • Packages: Parsec

    Blog Post

    I wrote a blog post about this project

    How to install and use

    You need to have cabal, Haskell installed. Then run the following commands To run the program you need gcc for the C version and SPIM for the MIPS version

    # clone the repo and move to it
    git clone https://github.com/tttardigrado/hsfuck
    cd hsfuck
    
    # build the project using cabal
    cabal build
    
    # optionally move the binary into another location with
    # cp ./path/to/binary .
    
    # run the compiler
    # (fst argument is compilation target mode. Either c or mips)
    # (snd argument is the path of the src file)
    # (trd argument is the path of the output file)
    ./hsfuck c test.bf test.c
    
    # compile and run the C code
    gcc test.c
    ./a.out

    Suggestion: Add the following snippets to your .bashrc

    # compile brainfuck to c and then to binary
    bfC()
    {
        ./hsfuck c $1 /tmp/ccode.c
        gcc /tmp/ccode.c -o $2
    }
    # simulate as MIPS (using SPIM)
    bfMIPS()
    {
        ./hsfuck mips $1 /tmp/mipscode.mips
        spim -file /tmp/mipscode.mips
    }

    Commands

    • + increment the value of the current cell
    • - decrement the value of the current cell
    • » right shift the value of the current cell
    • « left shift the value of the current cell
    • > move the tape one cell to the right
    • < move the tape one cell to the left
    • . print the value of the current cell as ASCII
    • , read the value of an ASCII character from stdin to the current cell
    • : print the value of the current cell as an integer
    • ; read an integer from stdin to the current cell
    • [c] execute c while the value of the cell is not zero
    • # print debug information

    References

    TO DO:

    • 0 set the cell to 0
    • » and « -> right and left shifts
    • Add more print and read options (integer)
    • remove register
    • compile to MIPS
    • Add debug to MIPS target
    • Test MIPS and C output
    • Add compilation target flag
    • Add commands documentation
    • Add references
    Visit original content creator repository
  • gsc-logger

    GSC Logger: A Tool To Log Google Search Console Data to BigQuery

    Google App Engine provides a Cron service for logging daily Google Search Console(GSC): Search Analytics data to BigQuery for use in
    Google Data Studio or for separate analysis beyond 3 months.

    Configuration

    This script runs daily and pulls data as specified in config.py file to BigQuery. There is little to configure without some programming experience.
    Generally, this script is designed to be a set-it-and-forget-it in that once deployed to app engine, you should be able to add your service account
    email as a full user to any GSC project and the Search Analytics data will be logged daily to BigQuery. By default the data is set to pull from GSC 7 days earler every day
    to ensure the data is available.

    • Note: This script should be deployed on the Google Account with access to your GSC data to ensure it is available to Google Data Studio
    • Note: This script has not been widely tested and is considered a POC. Use at your own risk!!!
    • Note: This script only works for Python 2.7 which is a restriction for GAE currently

    More installation details located here.
    Developed by Technical SEO Agency, Adapt Partners

    Deploying

    The overview for configuring and running this sample is as follows:

    1. Prerequisites

    2. Clone this repository

    To clone the GitHub repository to your computer, run the following command:

    $ git clone https://github.com/jroakes/gsc-logger.git
    

    Change directories to the gsc-logger directory. The exact path
    depends on where you placed the directory when you cloned the sample files from
    GitHub.

    $ cd gsc-logger
    

    3. Create a Service Account

    1. Go to https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts and create a Service Account in your project.
    2. Download the json file.
    3. Upload replacing the file in the credentials directory.

    4. Deploy to App Engine

    1. Configure the gcloud command-line tool to use the project your Firebase project.
    $ gcloud config set project <your-project-id>
    
    1. Change directory to appengine/
    $ cd appengine/
    
    1. Install the Python dependencies
    $ pip install -t lib -r requirements.txt
    
    1. Create an App Engine App
    $ gcloud app create
    
    1. Deploy the application to App Engine.
    $ gcloud app deploy app.yaml \cron.yaml \index.yaml
    

    4. Verify your Cron Job

    Go to the Task Queue tab in AppEngine and
    click on Cron Jobs to verify that the daily cron is set up correctly. The job should have a Run Now button next to it.

    4. Verify App

    Once deployed, you should be able to load your GAE deployment url in a browser and see a screen that lists your service account email and also attached GSC sites. This screen will also list the last cron save date for each site
    that you have access to.

    License

    Licensed under the Apache License, Version 2.0 (the “License”);
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an “AS IS” BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

    Visit original content creator repository

  • eventarbiter

    Build Status

    eventarbiter


    Kubernetes emits events when some important things happend internally.

    For example, when the CPU or Memory pool Kubernetes cluster provides can not satisfy the request application made, an FailedScheduling event will be emitted and the message contained in the event will explain what is the reason for the FailedScheduling with event message like pod (busybox-controller-jdaww) failed to fit in any node\nfit failure on node (192.168.0.2): Insufficient cpu\n or pod (busybox-controller-jdaww) failed to fit in any node\nfit failure on node (192.168.0.2): Insufficient memory\n.

    Also, if the application malloc a lot of memory which exceeds the limit watermark, kernel OOM Killer will arise and kill processes randomly. Under this circumstance, Kubernetes will emits an SystemOOM event with event message like System OOM encountered.

    Note that we may use various monitor stack for Kubernetes and we can send an alarm if the average usage of memory exceeds the 80 percent of limit in the past two minutes. However, if the memory malloc operation is done in a short duration, the monitor may not work properly to send an alarm on it for that the memory usage will rise up highly in a short duration and after that it will be killed and restarted with memory usage being normal. Resource fragment exists in Kubernetes cluster. We may encounter a situation that the total remaining memory and cpu pool can satisfy the request of application but the scheduler can not schedule the application instances. This is caused that the remaining cpu and memory resource is split across all the minion nodes and any single minion can not make cpu or memory resource for the application.

    Something that can not be handled by monitor can be handled by events. eventarbiter can watch for events, filter out events indicating bad status in Kubernetes cluster.

    eventarbiter supports callback when one of the listening events happends. eventarbiter DO NOT send event alarms for you and you should do this using yourself using callback.

    Comparison


    There are already some projects to do somthing about Kubernetes events.

    • Heapster has a component eventer. eventer can watch for events for a Kubernetes cluster and supports ElasticSearch, InfluxDB or log sink to store them. It is really useful for collecting and storing Kubernetes events. We can monitor what happends in the cluster without logging into each minion. eventarbiter also import the logic of watching Kubernetes from eventer.
    • kubewatch can only watch for Kubernetes events about the creation, update and delete for Kubernetes object, such as Pod and ReplicationController. kubewatch can also send an alarm through slack. However, kubewatch is limited in the events can be watched and the limited alarm tunnel. With eventarbiter‘s callback sink, you can POST the event alarm to a transfer station. And after that you can do anything with the event alarm, such as sending it with email or sending it with PagerDuty. It is on your control. 🙂

    Event Alarm Reason


    Event Description
    node_notready occurs when a minion(kubelet) node changed to NotReady
    node_notschedulable occurs when a minion(kubelet) node changed status to SchedulableDisabled
    node_systemoom occurs when a an application is OOM killed on a ‘minion'(kubelet) node
    node_rebooted occurs when a minion(kubelet) node is restrated
    pod_backoff occurs when an container in a pod can not be started normally. In our situation, this may be caused by the image can not be pulled or the image specified do not exist
    pod_failed occurs when an container in the pod can not be started normally. In our situation, this may be caused by the image can not be pulled or the image specified do not exist
    pod_failedsync occurs when an container in the pod can not be started normally. In our situation, this may be caused by the image can not be pulled or the image specified do not exist
    pod_failedscheduling occurs when an application can not be scheduled in the cluster
    pod_unhealthy occurs when the pod health check failed
    npd_oomkilling occurs when OOM happens
    npd_taskhung occurs when task hangs for /proc/sys/kernel/hung_task_timeout_secs(mainly used for docker ps hung)

    Note


    • For more info about npd_oomkilling and npd_taskhung, you should deploy node-problem-detector in your Kubernetes cluster.

    Usage


    Just like eventer in Heapster project. eventarbiter supports the source and sink command line arguments.

    • Argument
      • source
      • sink argument, the usage is like eventer sink. eventerarbiter supports stdout and callback.
        • stdout can log the event alarm to stdout with json format.
        • callback is a HTTP API with POST method enabled. The event alarm will be POSTed to the callback URL.
          • --sink=callback:CALLBACK_URL
          • CALLBACK_URL should return HTTP 200 or 201 for success. All other HTTP return status code will be considered failure.
      • environment
        • a comma separated key-value pairs as an Environment map field in event alert object. This can be used as a context to pass whatever you want.
      • event_filter
        • Event alarm reasons specified in event_filter will be filtered out from eventarbiter.

    The normal commands to start an instance of eventerarbiter will be

    • dev
      • eventarbiter -source='kubernetes:http://127.0.0.1:8080?inClusterConfig=false' -logtostderr=true -event_filter=pod_unhealthy -max_procs=3 -sink=stdout
    • production
      • eventarbiter -source='kubernetes:http://127.0.0.1:8080?inClusterConfig=false' -logtostderr=true -event_filter=pod_unhealthy -max_procs=3 -sink=callback:http://127.0.0.1:3086
      • There is also a faked http service in script/dev listening in 3086 with / endpoints.

    Build


    • make build
      • Note: eventarbiter requires Go1.7
    Visit original content creator repository
  • UsefulAppleScripts

    UsefulAppleScripts

    GitHub release GitHub All Releases GitHub

    A collection of Useful Applescripts to make MacOS life better.

    About

    Some of these specifically integrate with other software and others manipulate the OS.
    These are mostly created by cobbling together bits and pieces from other scripts and finding the thing that works best for my use case.
    Lots of these are quite simple but maybe they will solve a problem for someone else.

    Directory

    BarTender

    • DisplaysLaptopOnly.scpt determines if the monitor connected is a laptop display. This allows Bartender to know if a smaller display is detected. This interacts with Bartender’s new profile feature.
    • DisplaysNotLaptop.scpt very similar to DisplaysLaptopOnly.scpt but it determines if a monitor that is not the laptop is conected. Similarly, works with Bartender Profiles.

    Bunch

    • I use QuitXcodeBunch with Bunch.app, from the excellent @ttscoff, to close Xcode when leaving my Code bunch. It makes sure to stop any running tasks so that Xcode quits properly.
    • GetKMVar is a simple script to get the variable from inside Keyboard Maestro and make it available to an applescript. It works in conjunction with other scripts as more of a building block.

    Email

    • EmailHi.scpt is heavily inspired by David Sparks’ blog and was influenced heavily by various posts I read on the Automators forums and the Mac Power Users Forums.
      My version of the script includes both MS Outlook and Mail.app variations as I work in both pieces of software and wanted the ability to get first names in both applications. I trigger it with Keyboard Maestro text entry because I try to keep all the applescript triggers there. But you could also use text expander…

    URLs

    • SafariToFirefox.scpt opens the frontmost safari tab in Firefox. Personally I trigger this with Keyboard Maestro using a string trigger.
    • SafariToDuckDuckGo.scpt opens the frontmost Safari tab in DuckDuckGo. Again I trigger with Keyboard Maestro. You could modify this for any app you fancy. The main difference with the Firefox script is that duck duck go has fewer weird tab opening problems. Hense there is a delay built into the firefox script.
    • URLsToProfile.scpt uses Keyboard Maestro to open specified URLs in the Safari profile of your choosing.

    Thanks

    I’m not really amazing at any of this coding business, and I have only been able to work out these automations because of the excellent communities and software that others have made. I hope you find these useful.

    Visit original content creator repository
  • autogluon-image-classification

    Deploy AutoGluon Image Classifier on SageMaker

    Build Status

    AutoGluon on SageMaker

    This repository is a getting-started/ready-to-use kit for deploying your own automl model with AutoGluon MxNet on SageMaker. With SageMaker, you can have
    a real-time inference endpoint or run batch predictions with batch transforms.

    Getting started

    Host the docker image on AWS ECR

    • You can train your model locally or on SageMaker. Your model is automatically saved to the SageMaker model directory and, packaged and uploaded to S3 by SageMaker.

    • Required packages are already included in the requirements.txt. We also defined the installation of some packages in the Dockerfile.

    • To get your model working make the necessary code changes in the transformation function in the file /model/predictor.py.

    • Run /build_and_push.sh <image_name to deploy the docker image to AWS Elastic Container Registry

    Deploy your model in SageMaker

    I have included an example notebook which includes how to train locally and on a SageMaker ML instance.

    import boto3
    import sagemaker as sage
    from sagemaker import get_execution_role
    from sagemaker.predictor import csv_serializer
    
    image_tag = 'autogluon-image-classification' # use the <image_name> defined earlier
    sess = sage.Session()
    role = get_execution_role()
    account = sess.boto_session.client('sts').get_caller_identity()['Account']
    region = sess.boto_session.region_name
    image = f'{account}.dkr.ecr.{region}.amazonaws.com/{image_tag}:latest'
    
    training_data = 's3://autogluon/datasets/shopee-iet/data/train'
    test_data = 's3://autogluon/datasets/shopee-iet/data/test'
    
    artifacts = 's3://<your-bucket>/artifacts'
    sm_model = sage.estimator.Estimator(
        image,
        role,
        1,
        'ml.p2.xlarge', output_path=artifacts, sagemaker_session=sess
    )
    
    # Run the train program because it is expected
    sm_model.fit(
        {'training': training_data, 'testing': test_data}
    )
    
    # Deploy the model.
    predictor = sm_model.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

    More information

    SageMaker supports two execution modes: training where the algorithm uses input data to train a new model (we will not use this) and serving where the algorithm accepts HTTP requests and uses the previously trained model to do an inference.

    In order to build a production grade inference server into the container, we use the following stack to make the implementer’s job simple:

    1. [nginx][nginx] is a light-weight layer that handles the incoming HTTP requests and manages the I/O in and out of the container efficiently.
    2. [gunicorn][gunicorn] is a WSGI pre-forking worker server that runs multiple copies of your application and load balances between them.
    3. [flask][flask] is a simple web framework used in the inference app that you write. It lets you respond to call on the /ping and /invocations endpoints without having to write much code.

    The Structure of the Sample Code

    The components are as follows:

    • Dockerfile: The Dockerfile describes how the image is built and what it contains. It is a recipe for your container and gives you tremendous flexibility to construct almost any execution environment you can imagine. Here. we use the Dockerfile to describe a pretty standard python science stack and the simple scripts that we’re going to add to it. See the [Dockerfile reference][dockerfile] for what’s possible here.

    • build_and_push.sh: The script to build the Docker image (using the Dockerfile above) and push it to the [Amazon EC2 Container Registry (ECR)][ecr] so that it can be deployed to SageMaker. Specify the name of the image as the argument to this script. The script will generate a full name for the repository in your account and your configured AWS region. If this ECR repository doesn’t exist, the script will create it.

    • model: The directory that contains the application to run in the container. See the next session for details about each of the files.

    • docker-test: A directory containing scripts and a setup for running a simple training and inference jobs locally so that you can test that everything is set up correctly. See below for details.

    The application run inside the container

    When SageMaker starts a container, it will invoke the container with an argument of either train or serve. We have set this container up so that the argument in treated as the command that the container executes. When training, it will run the train program included and, when serving, it will run the serve program.

    • train: We will only copy the model to /opt/ml/model.pkl so SageMaker will create an artifact.
    • serve: The wrapper that starts the inference server. In most cases, you can use this file as-is.
    • wsgi.py: The start up shell for the individual server workers. This only needs to be changed if you changed where predictor.py is located or is named.
    • predictor.py: The algorithm-specific inference server. This is the file that you modify with your own algorithm’s code.
    • nginx.conf: The configuration for the nginx master server that manages the multiple workers.

    Setup for local testing

    The subdirectory local-test contains scripts and sample data for testing the built container image on the local machine. When building your own algorithm, you’ll want to modify it appropriately.

    • train-local.sh: Instantiate the container configured for training.
    • serve-local.sh: Instantiate the container configured for serving.
    • predict.sh: Run predictions against a locally instantiated server.
    • test-dir: The directory that gets mounted into the container with test data mounted in all the places that match the container schema.
    • payload.csv: Sample data for used by predict.sh for testing the server.

    The directory tree mounted into the container

    The tree under test-dir is mounted into the container and mimics the directory structure that SageMaker would create for the running container during training or hosting.

    • input/config/hyperparameters.json: The hyperparameters for the training job.
    • input/data/training/leaf_train.csv: The training data.
    • model: The directory where the algorithm writes the model file.
    • output: The directory where the algorithm can write its success or failure file.

    Environment variables

    When you create an inference server, you can control some of Gunicorn’s options via environment variables. These
    can be supplied as part of the CreateModel API call.

    Parameter                Environment Variable              Default Value
    ---------                --------------------              -------------
    number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
    timeout                  MODEL_SERVER_TIMEOUT              60 seconds
    

    Visit original content creator repository

  • DynamicLinq

    DynamicLinq

    Adds extensions to Linq to offer dynamic queryables.

    Roadmap

    Check “Projects” section of github to see whats going on.

    https://github.com/PoweredSoft/DynamicLinq/projects/1

    Download

    Full Version NuGet NuGet Install
    PoweredSoft.DynamicLinq NuGet PM> Install-Package PoweredSoft.DynamicLinq
    PoweredSoft.DynamicLinq.EntityFramework NuGet PM> Install-Package PoweredSoft.DynamicLinq.EntityFramework
    PoweredSoft.DynamicLinq.EntityFrameworkCore NuGet PM> Install-Package PoweredSoft.DynamicLinq.EntityFrameworkCore

    Samples

    Complex Query

    query = query.Query(q =>
    {
        q.Compare("AuthorId", ConditionOperators.Equal, 1);
        q.And(sq =>
        {
            sq.Compare("Content", ConditionOperators.Equal, "World");
            sq.Or("Title", ConditionOperators.Contains, 3);
        });
    });

    Shortcuts

    Shortcuts allow to avoid specifying the condition operator by having it handy in the method name

    queryable.Query(t => t.Contains("FirstName", "Dav").OrContains("FirstName", "Jo"));

    You may visit this test for more examples: https://github.com/PoweredSoft/DynamicLinq/blob/master/PoweredSoft.DynamicLinq.Test/ShortcutTests.cs

    Simple Query

    query.Where("FirstName", ConditionOperators.Equal, "David");

    Grouping Support

    TestData.Sales
    	.AsQueryable()
    	.GroupBy(t => t.Path("ClientId"))
    	.Select(t =>
    	{
    	    t.Key("TheClientId", "ClientId");
    	    t.Count("Count");
    	    t.LongCount("LongCount");
    	    t.Sum("NetSales");
    	    t.Average("Tax", "TaxAverage");
    	    t.Aggregate("Tax", SelectTypes.Average, "TaxAverage2"); // Starting 1.0.5
    	    t.ToList("Sales");
    	});

    Is equivalent to

    TestSales
    	.GroupBy(t => new { t.ClientId })
    	.Select(t => new {
    	    TheClientId = t.Key.ClientId,
    	    Count = t.Count(),
    	    LongCount = t.LongCount(),
    	    NetSales = t.Sum(t2 => t2.NetSales),
    	    TaxAverage = t.Average(t2 => t2.Tax),
    	    TaxAverage2 = t.Average(t2 => t2.Tax),
    	    Sales = t.ToList()
    	});

    Empty Group By

    This is common to create aggregate totals.

    someQueryable.EmptyGroupBy(typeof(SomeClass));

    Is equivalent to

    someQueryableOfT.GroupBy(t => true);

    Count shortcut

    IQueryable someQueryable = <something>;
    someQueryable.Count(); 

    Is equivalent to

    IQueryable<T> someQueryableOfT = <something>;
    someQsomeQueryableOfTueryable.Count(); 

    Select

    Note PathToList has been renamed to just ToList it seemed redudant, sorry for breaking change.

    var querySelect = query.Select(t =>
    {
    t.NullChecking(true); // not obligated but usefull for in memory queries. 
    t.ToList("Posts.Comments.CommentLikes", selectCollectionHandling: SelectCollectionHandling.Flatten);
    t.Path("FirstName");
    t.Path("LastName", "ChangePropertyNameOfLastName");
    });

    In Support

    You can filter with a list, this will generate a contains with your list.

    var ageGroup = new List<int>() { 28, 27, 50 };
    Persons.AsQueryable().Query(t => t.In("Age", ageGroup));

    String Comparision Support

    Persons.AsQueryable().Query(t => t.Equal("FirstName", "DAVID", stringComparision: StringComparison.OrdinalIgnoreCase));

    You may visit this test for more examples: https://github.com/PoweredSoft/DynamicLinq/blob/master/PoweredSoft.DynamicLinq.Test/StringComparision.cs

    Simple Sorting

    query = query.OrderByDescending("AuthorId");
    query = query.ThenBy("Id");

    Collection Filtering

    You don’t have to Worry about it. The library will do it for you.

    var query = authors.AsQueryable();
    query = query.Query(qb =>
    {
        qb.NullChecking();
    	// you can specify here which collection handling you wish to use Any and All is supported for now.
        qb.And("Posts.Comments.Email", ConditionOperators.Equal, "john.doe@me.com", collectionHandling: QueryCollectionHandling.Any);
    });

    Null Checking is automatic (practical for in memory dynamic queries)

    var query = authors.AsQueryable();
    query = query.Query(qb =>
    {
        qb.NullChecking();
        qb.And("Posts.Comments.Email", ConditionOperators.Equal, "john.doe@me.com", collectionHandling: QueryCollectionHandling.Any);
    });

    Using Query Builder

    // subject.
    var posts = new List<Post>()
    {
        new Post { Id = 1, AuthorId = 1, Title = "Hello 1", Content = "World" },
        new Post { Id = 2, AuthorId = 1, Title = "Hello 2", Content = "World" },
        new Post { Id = 3, AuthorId = 2, Title = "Hello 3", Content = "World" },
    };
    
    // the query.
    var query = posts.AsQueryable();
    var queryBuilder = new QueryBuilder<Post>(query);
    
    queryBuilder.Compare("AuthorId", ConditionOperators.Equal, 1);
    queryBuilder.And(subQuery =>
    {
        subQuery.Compare("Content", ConditionOperators.Equal, "World");
        subQuery.Or("Title", ConditionOperators.Contains, 3);
    });
    
    query = queryBuilder.Build();

    Entity Framework

    Using PoweredSoft.DynamicLinq.EntityFramework it adds an helper that allows you to do the following.

    var context = new <YOUR CONTEXT>();
    var queryable = context.Query(typeof(Author), q => q.Compare("FirstName", ConditionOperators.Equal, "David"));
    var result = queryable.ToListAsync().Result;
    var first = result.FirstOrDefault() as Author;
    Assert.AreEqual(first?.FirstName, "David");

    How it can be used in a web api

    I highly suggest looking @ https://github.com/poweredsoft/dynamicquery if you are interested in this sample.

    Sample how to use DynamicQuery with asp.net mvc core and EF Core: https://github.com/PoweredSoft/DynamicQueryAspNetCoreSample

    [HttpGet][Route("FindClients")]
    public IHttpActionResult FindClients(string filterField = null, string filterValue = null, 
    string sortProperty = "Id", int? page = null, int pageSize = 50)
    {
        var ctx = new MyDbContext();
        var query = ctx.Clients.AsQueryable();
    
        if (!string.IsNullOrEmpty(filterField) && !string.IsNullOrEmpty(filterValue))
    	query = query.Query(t => t.Contains(filterField, filterValue)).OrderBy(sortProperty);
    
        //  count.
        var clientCount = query.Count();
        int? pages = null;
    
        if (page.HasValue && pageSize > 0)
        {
    	if (clientCount == 0)
    	    pages = 0;
    	else
    	    pages = clientCount / pageSize + (clientCount % pageSize != 0 ? 1 : 0);
        }
    
        if (page.HasValue)
    	query = query.Skip((page.Value-1) * pageSize).Take(pageSize);
    
        var clients = query.ToList();
    
        return Ok(new
        {
    	total = clientCount,
    	pages = pages,
    	data = clients
        });
    }
    Visit original content creator repository
  • NYC-TLC-Airflow-ETL

    NYC-TLC-ETL

    The NYC-TLC-AIRFLOW-ETL repository is a comprehensive solution built with Apache Airflow DAG Running in Docker to extract High-Volume For-Hire Services parquet files from the year 2022, provided by the New York City Taxi and Limousine Commission (TLC), stored in an AWS bucket. The pipeline performs data transformation operations to cleanse and enrich the data before loading it into a Google BigQuery table. The purpose of this ETL pipeline is to prepare the data for consumption by a Looker Studio report, enabling detailed analysis and visualization of the trips by Uber and Lyft.

    The DAG is manually triggered or via the API. However, we can easily modify the workflow to run on a defined schedule, like running after the AWS bucket gets the latest High Volume FHV trip parquet file.

    Features

    • Extraction of High-Volume FHV parquet files from the year 2022.
    • Data transformation and cleansing operations.
    • Loading of the transformed data into a Google BigQuery table.
    • Manual triggering of the DAG or API-based triggering.
    • Configurable scheduling to run the pipeline after the AWS bucket receives the latest High Volume FHV trip parquet file.

    Prerequisites

    • Docker installed on your local machine.
    • Google BigQuery project and credentials.
    • AWS S3 bucket credentials.

    Setup

    1. Configure the AWS S3 bucket connection

      In order to access the High-Volume FHV parquet files stored in the AWS S3 bucket, you need to configure the AWS S3 bucket connection in Apache Airflow. Follow the steps below:

      • Open the Airflow web interface.
      • Go to Admin > Connections.
      • Click on Create to create a new connection.
      • Set the Conn Id field to s3_conn.
      • Set the Connection Type to Amazon Web Services.
      • Fill in the AWS Access Key ID.
      • Fill in the AWS Secret Access Key.
    2. To load the transformed data into a Google BigQuery table, you must to place the GOOGLE_APPLICATION_CREDENTIALS .json file in the “dags/” folder.

    3. Create BigQuery Table with Bigquery.ipynb.

    4. Set the Airflow variable

      The ETL pipeline requires an Airflow variable called HV_FHV_TABLE_ID, which is the ID of the BigQuery table where the transformed data will be loaded. Follow the steps below to set the variable:

      • Open the Airflow web interface.
      • Go to Admin > Variables.
      • Click on Create to create a new variable.
      • Set the Key field to HV_FHV_TABLE_ID.
      • Fill in the Value field with the ID of your BigQuery table.
      • Save the variable.

    Run

    1. Configure files for the docker-compose. For more information.

      mkdir -p ./logs ./plugins ./config
      echo -e "AIRFLOW_UID=$(id -u)" > .env
    2. Create custom apache airflow image with gcloud API

      docker build -t exented:latest .
    3. Running Airflow

      docker compose up

    Visit original content creator repository

  • box-of-life

    Box full of Life

    Modification of Ikea FREKVENS with Raspberry Pi Pico to play Conway’s Game of Life.

    Features

    • Start with random pattern and play game of life
    • If stable pattern or oscillator with period 2 occurs game will be restarted.
    • Control:
      • Red button
        • short press – turn ON and cycle LED brightness
        • long press – turn OFF
      • Yellow button
        • short press – cycle speed
        • long press – restart life

    Firmware

    FW is written in MicroPython.
    Use Thonny or mpremote
    to load content of src directory into Pico.

    cd src
    # copy everything (.) in to remote (:)
    mpremote cp -r . :
    # run main.py to see stdout
    mpremote run main.py

    Ikea FREKVENS HW Modification

    One need to disassembly Ikea FREKVENS box, remove original MCU board and connect RPi Pico. Steps:

    1. Disassembly, there are some tutorials already, e.g. here or here
    2. Remove original MCU (green) PCB and solder connector in place (or directly connect according to the following table via wires).
    3. (optional) disassembly power supply block and replace AC output plug with 3D printed USB connector holder. USB data pins are available on back side of RPi Pico as test points.

    Connection

    Board Pin/Wire RPi Pico PIN Note
    LED PCB 1 (Vcc) VSYS
    LED PCB 2 GPIO 4 En
    LED PCB 3 GPIO 3 Data
    LED PCB 4 GPIO 2 Clk
    LED PCB 5 GPIO 5 Latch
    LED PCB 6 (Gnd) GND
    Buttons Red wire GND
    Buttons Black wire GPIO 10 Yellow button
    Buttons White wire GPIO 11 Red button

    Connection between power supply and main PCB (4V and GND) is same.

    If USB connection is used, one must de-solder diode that are between VUSB and VSYS from Pico PCB. (here’s why)

    Ideas for improvements

    • Add predefined startup (e.g. glider)
    • Performance improvement (use SPI or PIO for communication, speed up game generation computation)

    Visit original content creator repository