Category: Blog

  • robofinder

    Robofinder

    Robofinder is a powerful Python script designed to search for and retrieve historical robots.txt files from Archive.org for any given website. This tool is ideal for security researchers, web archivists, and penetration testers to uncover previously accessible paths or directories that were listed in a site’s robots.txt.

    Features

    • Fetch historical robots.txt files from Archive.org.
    • Extract and display old paths or directories that were once disallowed or listed.
    • Save results to a specified output file.
    • Silent Mode for unobtrusive execution.
    • Multi-threading support for faster processing.
    • Option to concatenate extracted paths with the base URL for easy access.
    • Debug mode for detailed execution logs.
    • Extract old parameters from robots.txt files.

    Installation

    Using pipx

    Install Robofinder quickly and securely using pipx:

    pipx install git+https://github.com/Spix0r/robofinder.git

    Manual Installation

    To install manually:

    git clone https://github.com/Spix0r/robofinder.git
    cd robofinder
    pip install -r requirements.txt

    Usage

    Basic Command

    If installed via pipx:

    robofinder -u https://example.com

    For manual installation:

    python3 robofinder.py -u https://example.com

    Options and Examples

    • Save output to a file:

      robofinder -u https://example.com -o results.txt
    • Silent Mode (minimal output to console):

      robofinder -u https://example.com -s
    • Concatenate paths with the base URL:

      robofinder -u https://example.com -c
    • Extract Paramters:

      robofinder -u https://example.com -p
    • Enable Debug Mode:

      robofinder -u https://example.com --debug
    • Multi-threading (default: 10 threads):

      robofinder -u https://example.com -t 10

    Advanced Usage

    Combine options for tailored execution:

    robofinder -u https://example.com -t 10 -c -o results.txt -s

    Example Output

    Running Robofinder on example.com with 10 threads, silent mode, and saving just the paramters to the results.txt:

    robofinder -u https://example.com -t 10 -o results.txt -s -p

    Contributing

    Contributions are highly welcome! If you have ideas for new features, optimizations, or bug fixes, feel free to submit a Pull Request or open an issue on the GitHub repository.


    Visit original content creator repository

  • txtnish

    txtnish

    A twtxt client with minimal dependencies

    Unmaintained

    I haven’t used txtnish for a long time now, so it’s probably time to archive this project. Sorry.

    Synopsis

    $ txtnish follow bob http://example.com/twtxt.txt
    $ txtnish tweet 'Hello twtxt world'
    $ txtnish timeline

    Description

    txtnish is a client for twtxt–the
    decentralised, minimalist microblogging service for hackers.

    Instead of signing up at a closed and/or regulated microblogging platform,
    getting your status updates out with twtxt is as easy as putting them in a
    publicly accessible text file. The URL pointing to this file is your identity,
    your account. twtxt then tracks these text files, like a feedreader, and builds
    your unique timeline out of them, depending on which files you track. The
    format is simple, human readable, and integrates well with UNIX command line
    utilities.

    All subcommands of txtnish provide extensive help, so don’t hesitate
    to call them with the -h option.

    If you are a new user, there is a quickstart command that will ask you some
    questions and write a configuration file for you:

    $ txtnish quickstart

    Installation

    txtnish only depends on tools you normally find in a POSIX environment:
    awk, sort, cut and sh. There are only two exceptions: you need curl
    to download twtxt files and a xargs that support parallel processing
    via -P. You can use a xargs without, but then txtnish falls back to
    downloading one url after another.

    Installation itself is as easy as it gets: just copy the script somewhere
    in your PATH.

    Subcommands

    tweet

    Appends a new tweet to your twtxt file. There are three different ways
    to input tweets. You can either pipe them into tweet, or pass them along
    as arguments. When you call txtnish tweet without any arguments and
    it’s not connected to a pipe, it will call $EDITOR for you and tweet
    any line as a separate tweet.

    timeline

    Retrieves your personal timeline.

    publish

    Publishes your twtfile. This is especially helpful after you changed your
    post_tweet_hook.

    follow

    Adds a new source to your followings.

    unfollow

    Removes an existing source from your followings.

    following

    Prints the list of the sources you’re following.

    reply

    Displays an outcommented version of your timeline in $EDITOR. Every
    line that is not commented after you saved and exited the editor, will
    be tweeted.

    Search tweets

    You can provide a search expression to filter your timeline with the flag
    -S. The search expression is an awk conditional with four predefined
    variables:

    • msg: the message itself
    • url: the url of the twtfile
    • nick: this nick associated with the url
    • ts: the timestamp of the message

    Examples:

    txtnish timeline -S 'nick == "mdom" && msg ~ /#twtxt/'

    Configuration

    At startup txtnish checks whether ~/.config/txtnish/config exists and
    will source it if it exists. The configuration file must be a valid
    shell script.

    General

    add_metadata

    Add metadata to twtxt file. Default to 0 (false).

    awk

    Path to the awk binary. Defaults to awk.

    sed

    Path to the sed binary. Defaults to sed.

    limit

    How many tweets should be shown in timeline. Defaults to 20.

    formatter

    Defined which command is used to wrap each tweet to fit on the screen. It
    defaults to fold -s.

    sort_order

    How to sort tweets. This option can be either ascending or
    descending. ascending prints the oldest tweet first, descending the
    newest. This value can be overridden with the -d and -a flags.

    timeout

    Maximum time in seconds that each http connection can take. Defaults
    to zero.

    use_color

    If the output should be colorized with ANSI escape sequences. See the
    section COLORS on how to change the color settings. Defaults to 1.

    pager

    Which pager to use if use_pager is enabled. Default to less -R in order
    to display colors. This can be toggled with -p or -P to enable or
    disable the pager. Defaults to 1.

    disclose_identity

    If set to 1, send your nick and twturl with every http request. This
    makes only sense if you also set twturl and nick. Defaults to 0.

    nick

    Your nick. This is used to collapse mentions of your twturl and is send to
    all feeds you’re following if disclose_identity is set to 1.
    Defaults to the environment variable $USER.

    twturl

    The url of your feeds. This is used to collapse mentions and is send to
    all feeds you’re following if disclose_identity is set to 1. Defaults
    to the environment variable $USER.

    always_update

    Always update all feeds before showing tweets. If you set this variable
    to 0, you need to update manually with the update command.

    http_proxy

    Sets the proxy server to use for HTTP.

    https_proxy

    Sets the proxy server to use for HTTPS.

    sign_twtfile

    If set to 1, sign the twtfile with pgp. Defaults to 0.

    In case you are also overwriting the post_tweet_hook note that this
    will create a signed file in a temporary directory and change the value of
    twtfile accordingly. Your twtfile will not be changed!

    Signing your twtfile might break some twtxt clients as lines without
    a TAB are not allowed by a strict reading of the spec.

    check_signature

    Verify pgp signatures and show the result in the timeline if set to 1. Defaults to 0.

    sign_user

    Sets a different local user to sign twtfile than what is the default. It will
    print a message indicating an override is in place.

    gpg_bin

    Sets custom name of gpg executable.

    ipfs_gateway

    When you subscribe to an ipns:// address, txtnish will call this gateway to get
    the users twtfile. Defaults to http://localhost:8080 and falls back to
    https://ipfs.io if txtnish can’t reach the gateway.

    Publish with scp

    scp_user

    Use the given username to connect to the remote server. Required to publish
    with scp.

    scp_host

    Copy twtfile to this host. Required to publish with scp.

    scp_remote_name

    Name of twtfile on remote host. Defaults to the basename of the twtfile.

    sftp_over_scp

    Use SFTP instead of SCP if set to 1.

    Publish with ftp

    ftp_user

    Use the given username to connect to the remote server. Required to publish
    with ftp.

    ftp_host

    Copy twtfile to this host. Required to publish with ftp.

    ftp_remote_name

    Name of twtfile on remote host. Defaults to the basename of the twtfile.

    Publish with IPFS

    ipfs_publish

    Publish the twtfile with ipfs if set to 1. Defaults to 0.

    You will need the ipfs tools and a running daemon to publish to ipfs.

    ipfs_wrap_with_dir

    Call ipfs add with --wrap-with-dir if set to 1. Defaults to 0.

    ipfs_recursive

    Call ipfs add with --recursive if set to 1. The complete directory of
    your twtfile will be published. Defaults to 0.

    Colors

    If use_color is set to 1, the nick, timestamp, mentions and hashtags
    will be colorized. txtnish recognizes black, red, green, yellow, blue,
    magenta, cyan and white. You can set the background color with the prefix
    on_.

    color_nick="yellow on_white"
    

    Additional a color definiation can specify the attributes bold, bright,
    faint, italic, underline, blink and fastblink if your terminal supports
    them.

    color_nick="yellow on_white blink"
    

    The order of colors and attributes doesn’t matter and multiple attributes can
    be combined.

    txtnish uses the following defaults.

    color_nick=yellow
    color_time=blue
    color_mention=cyan
    color_hashtag=yellow

    Hooks

    To customize the behaviour of txtnish the user can override functions.

    pre_tweet_hook

    This hook is called before a new tweet is appended to your twtfile. This can be
    useful if you’re using txtnish on multiple devices and want to update your
    local twtfile before appending to it. There’s a predefined function
    sync_twtfile that does exactly that.

    pre_tweet_hook () {
    	sync_twtfile
    }

    post_tweet_hook

    post_tweet_hook is called after txtnish has appended new tweets to your
    twtfile. It’s a good place to uploade your file somewhere.

    post_tweet_hook () {
    	gist -u ID -f "$twtfile"
    }

    filter_tweets_hook

    See also

    twtxt, we-are-twtxt

    License

    Copyright 2017 Mario Domgoergen mario@domgoergen.com

    This program is free software: you can redistribute it and/or modify it under
    the terms of the GNU General Public License as published by the Free Software
    Foundation, either version 3 of the License, or (at your option) any later
    version.

    This program is distributed in the hope that it will be useful, but WITHOUT ANY
    WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE. See the GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along with
    this program. If not, see http://www.gnu.org/licenses/.

    Visit original content creator repository

  • remark-admonitions

    Travis (.com) npm

    remark-admonitions

    A remark plugin for admonitions designed with Docusaurus v2 in mind.

    remark-admonitions is now included out-of-the-box with @docusaurus/preset-classic!

    example of admonitions

    Installation

    remark-admonitions is available on NPM.

    npm install remark-admonitions

    unified + remark

    If you’re using unified/remark, just pass the plugin to use()

    For example, this will compile input.md into output.html using remark, rehype, and remark-admonitions.

    const unified = require('unified')
    const markdown = require('remark-parse')
    // require the plugin
    const admonitions = require('remark-admonitions')
    const remark2rehype = require('remark-rehype')
    const doc = require('rehype-document')
    const format = require('rehype-format')
    const html = require('rehype-stringify')
    const vfile = require('to-vfile')
    const report = require('vfile-reporter')
    
    const options = {}
    
    unified()
      .use(markdown)
      // add it to unified
      .use(admonitions, options)
      .use(remark2rehype)
      .use(doc)
      .use(format)
      .use(html)
      .process(vfile.readSync('./input.md'), (error, result) => {
          console.error(report(error || result))
          if (result) {
            result.basename = "output.html"
            vfile.writeSync(result)
          }
      })

    Docusaurus v2

    @docusaurus/preset-classic includes remark-admonitions.

    If you aren’t using @docusaurus/preset-classic, remark-admonitions can still be used through passing a remark plugin to MDX.

    Usage

    Admonitions are a block element. The titles can include inline markdown and the body can include any block markdown except another admonition.

    The general syntax is

    :::keyword optional title
    some content
    :::

    For example,

    :::tip pro tip
    `remark-admonitions` is pretty great!
    :::

    The default keywords are important, tip, note, warning, and danger. Aliases for info => important, success => tip, secondary => note and danger => warning have been added for Infima compatibility.

    Options

    The plugin can be configured through the options object.

    Defaults

    const options = {
      customTypes: customTypes, // additional types of admonitions
      tag: string, // the tag to be used for creating admonitions (default ":::")
      icons: "svg"|"emoji"|"none", // the type of icons to use (default "svg")
      infima: boolean, // wether the classes for infima alerts should be added to the markup
    }

    Custom Types

    The customTypes option can be used to add additional types of admonitions. You can set the svg and emoji icons as well as the keyword. You only have to include the svg/emoji fields if you are using them. The ifmClass is only necessary if the infima setting is true and the admonition should use the look of an existing Infima alert class.

    const customTypes = {
      [string: keyword]: {
        ifmClass: string,
        keyword: string,
        emoji: string,
        svg: string,
      } | string
    }

    For example, this will allow you to generate admonitions will the custom keyword.

    customTypes: {
      custom: {
        emoji: '💻',
        svg: '<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M15 2H1c-.55 0-1 .45-1 1v9c0 .55.45 1 1 1h5.34c-.25.61-.86 1.39-2.34 2h8c-1.48-.61-2.09-1.39-2.34-2H15c.55 0 1-.45 1-1V3c0-.55-.45-1-1-1zm0 9H1V3h14v8z"></path></svg>'
      }
    }

    To create an alias for an existing type, have the value be the keyword the alias should point to.

    customTypes: {
      alias: "custom"
    }

    The generated markup will include the class admonition-{keyword} for styling.

    If the infima option is true, the classes alert alert--{type} will be added to inherit the default Infima styling.

    Styling

    You’ll have to add styles for the admonitions. With Docusaurus, these can be added to custom.css.

    Infima (Docusaurus v2)

    The Infima theme (styles/infima.css) is used by @docusaurus/preset-classic.

    infima theme

    Classic (Docusaurus v1)

    The classic theme (styles/classic.css) replicates the look of remarkable-admonitions and Docusaurus v1.

    classic theme

    Credit

    Syntax and classic theme based on remarkable-admonitions.

    The SVG icons included are from GitHub Octicons.

    Visit original content creator repository
  • lapack-base-spttrf

    About stdlib…

    We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we’ve built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.

    The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.

    When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.

    To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!

    spttrf

    NPM version Build Status Coverage Status

    Compute the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A.

    Installation

    npm install @stdlib/lapack-base-spttrf

    Alternatively,

    • To load the package in a website via a script tag without installation and bundlers, use the ES Module available on the esm branch (see README).
    • If you are using Deno, visit the deno branch (see README for usage intructions).
    • For use in Observable, or in browser/node environments, use the Universal Module Definition (UMD) build available on the umd branch (see README).

    The branches.md file summarizes the available branches and displays a diagram illustrating their relationships.

    To view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.

    Usage

    var spttrf = require( '@stdlib/lapack-base-spttrf' );

    spttrf( N, D, E )

    Computes the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A.

    var Float32Array = require( '@stdlib/array-float32' );
    
    var D = new Float32Array( [ 4.0, 5.0, 6.0 ] );
    var E = new Float32Array( [ 1.0, 2.0 ] );
    
    spttrf( 3, D, E );
    // D => <Float32Array>[ 4, 4.75, ~5.15789 ]
    // E => <Float32Array>[ 0.25, ~0.4210 ]

    The function has the following parameters:

    • N: order of matrix A.
    • D: the N diagonal elements of A as a Float32Array.
    • E: the N-1 subdiagonal elements of A as a Float32Array.

    Note that indexing is relative to the first index. To introduce an offset, use typed array views.

    var Float32Array = require( '@stdlib/array-float32' );
    
    // Initial arrays...
    var D0 = new Float32Array( [ 0.0, 4.0, 5.0, 6.0 ] );
    var E0 = new Float32Array( [ 0.0, 1.0, 2.0 ] );
    
    // Create offset views...
    var D1 = new Float32Array( D0.buffer, D0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
    var E1 = new Float32Array( E0.buffer, E0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
    
    spttrf( 3, D1, E1 );
    // D0 => <Float32Array>[ 0.0, 4.0, 4.75, ~5.15789 ]
    // E0 => <Float32Array>[ 0.0, 0.25, ~0.4210 ]

    spttrf.ndarray( N, D, strideD, offsetD, E, strideE, offsetE )

    Computes the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A using alternative indexing semantics.

    var Float32Array = require( '@stdlib/array-float32' );
    
    var D = new Float32Array( [ 4.0, 5.0, 6.0 ] );
    var E = new Float32Array( [ 1.0, 2.0 ] );
    
    spttrf.ndarray( 3, D, 1, 0, E, 1, 0 );
    // D => <Float32Array>[ 4, 4.75, ~5.15789 ]
    // E => <Float32Array>[ 0.25, ~0.4210 ]

    The function has the following additional parameters:

    • strideD: stride length for D.
    • offsetD: starting index for D.
    • strideE: stride length for E.
    • offsetE: starting index for E.

    While typed array views mandate a view offset based on the underlying buffer, the offset parameters support indexing semantics based on starting indices. For example,

    var Float32Array = require( '@stdlib/array-float32' );
    
    var D = new Float32Array( [ 0.0, 4.0, 5.0, 6.0 ] );
    var E = new Float32Array( [ 0.0, 1.0, 2.0 ] );
    
    spttrf.ndarray( 3, D, 1, 1, E, 1, 1 );
    // D => <Float32Array>[ 0.0, 4.0, 4.75, ~5.15789 ]
    // E => <Float32Array>[ 0.0, 0.25, ~0.4210 ]

    Notes

    • Both functions mutate the input arrays D and E.

    • Both functions return a status code indicating success or failure. A status code indicates the following conditions:

      • 0: factorization was successful.
      • <0: the k-th argument had an illegal value, where -k equals the status code value.
      • 0 < k < N: the leading principal minor of order k is not positive and factorization could not be completed, where k equals the status code value.
      • N: the leading principal minor of order N is not positive, and factorization was completed.
    • spttrf() corresponds to the LAPACK routine spttrf.

    Examples

    var discreteUniform = require( '@stdlib/random-array-discrete-uniform' );
    var spttrf = require( '@stdlib/lapack-base-spttrf' );
    
    var opts = {
        'dtype': 'float32'
    };
    var D = discreteUniform( 5, 1, 5, opts );
    console.log( D );
    
    var E = discreteUniform( D.length-1, 1, 5, opts );
    console.log( E );
    
    // Perform the `L * D * L^T` factorization:
    var info = spttrf( D.length, D, E );
    console.log( D );
    console.log( E );
    console.log( info );

    C APIs

    Usage

    TODO

    TODO

    TODO.

    TODO

    TODO

    TODO

    Examples

    TODO

    Notice

    This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

    For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

    Community

    Chat


    License

    See LICENSE.

    Copyright

    Copyright © 2016-2025. The Stdlib Authors.

    Visit original content creator repository
  • sheppack

    ACM TOMS Algorithm 905: SHEPPACK: Modified Shepard Algorithm for Interpolation of Scattered Multivariate Data

    SHEPPACK is a Fortran 95 package containing five versions of the modified Shepard algorithm: quadratic (Fortran 95 translations of Algorithms 660, 661, and 798), cubic (Fortran 95 translation of Algorithm 791), and linear variations of the original Shepard algorithm. An option to the linear Shepard code is a statistically robust fit, intended to be used when the data is known to contain outliers. SHEPPACK also includes a hybrid robust piecewise linear estimation algorithm RIPPLE (residual initiated polynomial-time piecewise linear estimation) intended for data from piecewise linear functions in arbitrary dimension m. The main goal of SHEPPACK is to provide users with a single consistent package containing most existing polynomial variations of Shepard’s algorithm. The algorithms target data of different dimensions. The linear Shepard algorithm, robust linear Shepard algorithm, and RIPPLE are the only algorithms in the package that are applicable to arbitrary dimensional data.

    • This code has been re-uploaded with the permission of Drs. William Thacker
      and Layne Watson.
      All comments and questions should be directed to them (see contact info at
      the bottom of this file).

    Organizational Details

    The original source code, exactly as distributed by ACM TOMS, is included in
    the src directory.
    The src directory also contains its own README and build instructions.
    Comments at the top of each subroutine document their proper usage.

    Several minor modifications to the contents of src have been made:

    • The included Makefile has been slightly modified to run all tests
      when the make all command is run
    • All file extensions have been changed form .f95 to .f90 for
      compiler compatibility reasons.

    Reference and Contact

    To cite this work, use:

        @article{alg905,
            author = {Thacker, William I. and Zhang, Jingwei and Watson, Layne T. and Birch, Jeffrey B. and Iyer, Manjula A. and Berry, Michael W.},
            title = {{Algorithm 905: SHEPPACK}: Modified {S}hepard Algorithm for Interpolation of Scattered Multivariate Data},
            year = {2010},
            volume = {37},
            number = {3},
            journal = {ACM Trans. Math. Softw.},
            articleno = {34},
            numpages = {20},
            doi = {10.1145/1824801.1824812}
        }
    

    Inquiries should be directed to

    William I. Thacker,
    Department of Computer Science, Winthrop University,
    Rock Hill, SC 29733;
    wthacker@winthrop.edu

    Layne T. Watson,
    Department of Computer Science, VPI&SU,
    Blacksburg, VA 24061-0106;
    (540) 231-7540;
    ltw@vt.edu

    Visit original content creator repository

  • XmlExtractor

    XmlExtractor

    The XmlExtractor is a class that will parse XML very efficiently with the XMLReader object and produce an object (or array) for every item desired. This class can be used to read very large (read GB) XML files

    How to Use

    Given the XML file below:

    <root>
    	<item>
    		<tag1>Value 1</tag1>
    		<tag2>
    			<subtag1>Sub Value 2</subtag1>
    		</tag2>
    	</item>
    </root>

    this is the pattern you would use to parse XML with XmlExtractor:

    $source = new XmlExtractor("root/item", "/path/to/file.xml");
    foreach ($source as $item) {
      echo $item->tag1;
      echo $item->tag2->subtag1;
    }

    Options

    There are four parameters you can pass the constructor

    XmlExtractor($rootTags, $filename, $returnArray, $mergeAttributes)
    
    • $rootTags Specify how deep to go into the structure before extracting objects. Examples are below
    • $filename Path to the XML file you want to parse. This is optional as you can pass an XML string with loadXml() method
    • $returnArray If true, every iteration will return items as an associative array. Default is false
    • $mergeAttributes If true, any attributes on extracted tags will be included in the returned record as additional tags. Examples below

    Methods

    XmlExtractor.loadXml($xml)
    

    Loads XML structure from a php string

    XmlExtractor.getRootTags()
    

    This will return the skipped root tags as objects as soon as they are available

    XmlItem.export($mergeAttributes = false)
    

    Convert this XML record into an array. If $mergeAttributes is true, any attributes are merged into the array returned

    XmlItem.getAttribute($name)
    

    Returns the record’s named attribute

    XmlItem.getAttributes()
    

    Returns this record’s attributes if any

    XmlItem.mergeAttributes($unsetAttributes = false)
    

    Merges the record’s attributes with the rest of the tags so they are accessible as regular tags. If unsetAttributes is true, the internal attribute object will be removed

    Examples

    Iterating over XML items

    Simple XML structure and straight forward php.

    <earth>
    	<people>
    		<person>
    			<name>
    				<first>Paul</first>
    				<last>Warelis</last>
    			</name>
    			<gender>Male</gender>
    			<skill>Javascript</skill>
    			<skill>PHP</skill>
    			<skill>Beer</skill>
    		</person>
    	</people>
    </earth>

    $source = new XmlExtractor("earth/people/person", "/path/to/above.xml");
    foreach ($source as $person) {
      echo $person->name->first; // Paul
      echo $person->gender; // Male
      foreach ($person->skill as $skill) {
        echo $skill;
      }
      $record = $person->export();
    }

    The first constructor argument is a slash separated tag list that communicates to XmlExtractor that you want to extract “person” records (last tag entry) from earth -> people structure.
    The export method on the $person object returns it in array form, which will look like this:

    array(
      'name' => array(
        'first' => 'Paul',
        'last' => 'Warelis'
      ),
      'gender' => 'Male'
      'skill' => array(
        '0' => 'Javascript',
        '1' => 'PHP',
        '2' => 'Beer'
      )
    )

    It’s important to note that the repeating tag “skill” turned into an array.

    Loading XML from a string

    First create the extractor and then use loadXml() method to get the data in.

    $xml = <<<XML
    <house>
    	<room>
    		<corner location="NW"/>
    		<corner location="SW"/>
    		<corner location="SE"/>
    		<corner location="NE"/>
    	</room>
    </house>
    XML;
    
    $source = new XmlExtractor("house/room");
    $source->loadXml($xml);
    foreach ($source as $room) {
    	var_dump($room->export());
    	var_dump($room->export(true));
    }

    The first dump will show the “corner” field that contains four empty values:

    array(
      'corner' => array(
        '0' => '',
        '1' => '',
        '2' => '',
        '3' => ''
      )
    )

    But when you merge the attributes with the tag data, the array changes to:

    array(
      'corner' => array(
        '0' => array( "location" => "NW"),
        '1' => array( "location" => "SW"),
        '2' => array( "location" => "SE"),
        '3' => array( "location" => "NE")
      )
    )

    Dealing with attributes

    This example demonstrates how to deal with attributes.

    <office address="123 Main Street">
    	<items total="2">
    		<item name="desk">
    			<size width="120" height="33" length="70">large</size>
    			<image>desk.png</image>
    		</item>
    		<item image="cubicle.jpg">
    			<name>cubicle</name>
    			<size>
    				<width>120</width>
    				<height>33</height>
    				<length>60</length>
    				<size>large</size>
    			</size>
    		</item>
    	</items>
    </office>

    There are a number of things going on with the above XML.
    The two root tags that we have to skip to get to our items have information attached.
    We can get at these with the getRootTags() method. The next issue is that both items are using attributes to define their data.
    This example is a bit contrived, but it will show the functionality behind the mergeAttributes feature.
    By the end of this example, we will have two items with identical structure.

    $office = new XmlExtractor("office/items/item", "/path/to/above.xml");
    foreach ($office as $item) {
      $compressed = $item->export(true); // true = merge attributes into the item
      var_dump($compressed);
    }
    foreach ($office->getRootTags() as $name => $tag) {
      echo "Tag name: {$name}";
      var_dump($tag->getAttributes());
    }

    Once “compressed” (exported with merged attributes) the structure of both items is the same.
    In the event of an attribute having the same name as the tag, the tag takes precedence and is never overwritten.
    The two items will end up looking like this:

    array(
      'name' => 'desk',
      'size' => array(
        'width' => '120',
        'height' => '33',
        'length' => '70',
        'size' => 'large'
      ),
      'image' => 'desk.png'
    )
    array(
      'image' => 'cubicle.jpg'
      'name' => 'cubicle',
      'size' => array(
        'width' => '120',
        'height' => '33',
        'length' => '70',
        'size' => 'large'
      )
    )

    The root tags bit will come up with this:

    Tag name: office
    array(
      'address' => '123 Main Street'
    )
    Tag name: items
    array(
      'total' => '2'
    )

    Using Wildcards (*)

    If your XML file has markup like this:

    <art>
    	<painting>
    		<name>Mona Lisa</name>
    	</painting>
    	<sculpture>
    		<name>Dying Gaul</name>
    	</sculpture>
    	<photo>
    		<name>Afghan Girl</name>
    	</photo>
    </art>

    The art tag contains many different items. To parse them, do this (notice the path to the tag):

    $art = new XmlExtractor("art/*", "/path/to/above.xml");
    foreach ($art as $name => $piece) {
      echo "Piece : " . $piece->getName();
      var_dump($piece->export());
    }

    The output would be something like this:

    Piece : painting
    array('name' => 'Mona Lisa')
    Piece : sculpture
    array('name' => 'Dying Gaul')
    Piece : photo
    array('name' => 'Afghan Girl')

    If you find bugs, post an issue. I will correct or educate.

    Enjoy!

    Contact

    pwarelis at gmail dot com

    Visit original content creator repository

  • azhangproject

    title output
    Reflections on NY Phil — The NY Phil as a lens on changes in US society
    html_document

    Around the turn of the century, New York City became the arts center of the world. Its establishment not only encouraged the flourishing of American musicians but also attracted musicians from all over the world to NYC. NY Philharmonic as an important art and culture institution, reflects the social and economic changes of the United States society over time. In this study I focus on NY Philharmonic data from three perspectives: 1. the nationality of composers whose works are performed by NY Philharmonic in relation to the political enviroments of the US; 2. the status of women composers over time; 3. the elasticity of an art and culture institute’s reaction to social issues by comparing NY Phil performance data and MoMA exhibition data.

    ###1. Getting data from NY Philharmonic’s github page
    First of all, I read the XML file from NY Philharmonic’s github page (https://github.com/nyphilarchive/PerformanceHistory/blob/master/Programs/complete.xml) and found the number of every composers whose work was performed for all seasons and put them in a table.

    require("XML")
    require(mosaic)
    xmlfile <- xmlParse("complete.xml",encoding="UTF-8")
    rootnode = xmlRoot(xmlfile) #gives content of root
    
    incrementComp <- function(composer_stats, c, season){
      if (is.null(composer_stats[c, season])) {
        composer_stats[c, season] <- 1
      } else if (is.na(composer_stats[c,season])) {
        composer_stats[c, season] <- 1
      } else {
        composer_stats[c, season] <- composer_stats[c, season] + 1
      }
      return(composer_stats)
    }
    
    composerBySeasonComplete <- data.frame()
    for (seas in 1:xmlSize(rootnode)) {
      # DEBUG: cat(seas, "\n")
      firstlist <- xmlToList(rootnode[[seas]])
      season <- firstlist$season
      season <- paste("Season",season,sep=".")
      works <- firstlist$worksInfo
      if (is.list(works)) {     # sometimes works is actually empty
          for (i in 1:length(works)) {
            if (!is.null(works[[i]]$composerName)) {    #sometimes there is no composer
              composerBySeasonComplete <- incrementComp(composerBySeasonComplete, works[[i]]$composerName,season)
            }
          }
        }
    }
    colnames(composerBySeasonComplete)[1]="composers"
    write.csv(composerBySeasonComplete, "composerBySeasonComplete.csv")
    

    the cleaned data look like:

    composerBySeasonComplete <- read.csv("composerBySeasonComplete.csv", row.names=1, encoding="UTF-8")
    composerBySeasonComplete[1:5,1:5]
    

    To get a general sense of the data, I ordered composers by the number of works performed in descending order.

    SumComp=rowSums(composerBySeasonComplete[2:175],na.rm=TRUE)
    SumComp=cbind(composerBySeasonComplete[1],SumComp)
    SumComp1=SumComp[order(-SumComp$SumComp),]
    

    The following graph shows that most of the composers’ works got performed fewer than ten times, and only 16 composers’ works are performed more than 1000 times. Therefore, I expect the composers to be diverse.

    require(mosaic)
    nrow(SumComp1)
    hist(SumComp1$SumComp,main="number of performance histogram",xlab="number of performance")
    
    comp1000=subset(SumComp1,SumComp>=1000)
    nrow(comp1000)
    comp1000
    
    compl1000=subset(SumComp1,SumComp<=10)
    nrow(compl1000)
    hist(compl1000$SumComp,main="number of performance histogram",xlab="number of performance")
    

    2. Number of Performance per year and economics

    Graph of number of performance per year

    SumSeas=colSums(composerBySeasonComplete[2:175],na.rm=TRUE)
    require(ggplot2)
    qplot(seq_along(as.double(SumSeas)),as.double(SumSeas))+geom_line()+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("number of performance")
    

    GDP anuual rate of change
    http://www.multpl.com/us-gdp-growth-rate

    According to Marx, economics base determines the superstructure of the society, which is reflected as the economic development level determines the politics, art and culture activity of a society. Originally, I was thinking of studying the relationship between the number of contemporary composers’ works performed at NY Philharmonic and the GDP growth rate to see how the number of contemporary composers work performed reflects society’s emphasis on art and music education. But the list of composers’ birth and death year is incomplete. Therefore I cannot determine which composers are alive at the time their works are performed by the NY Phil. Thus in order to see the relationship between US economic development and NY Phil performances, I decided to study the relationship between the number of concerts in each season and US GDP growth rate. The graph shows that GDP growth rate and performance don’t have similar patterns. However, from a micro perspective, the number of performance per year reflects the NY Phil’s own economic condition. For example, the boom of the number of performance at the beginning of the twentieth century is explained by recognizing that several orchestras merged.

    3.Normalized Performance Frequency Score

    Because the number of performance change year by year, I computed a “Normalized Performance Frequency Score” to normalize by total number of performances. I got the normalized performance frequency score by dividing the number of performances for each composers in each season by the total number of performance in each season.

    require(base)
    composerBySeasonComplete[is.na(composerBySeasonComplete)] <- 0
    composerBySeasonComplete1=composerBySeasonComplete[2:175]
    composerBySeasonComplete2=composerBySeasonComplete[1]
    popScoreComposerComplete=data.frame()
    totalNumConcert=colSums(composerBySeasonComplete1, na.rm=TRUE)
    for ( i in 1:2652){
      popScoreComposerComplete[i,]=composerBySeasonComplete1[i,]/totalNumConcert
      i=i+1
    }
    popScoreComposerComplete=cbind(composerBySeasonComplete2,popScoreComposerComplete)
    write.csv(popScoreComposerComplete,"popScoreComposerComplete.csv")
    

    the Normalized Performance Frequency Score table looks like:

    #popScoreComposerComplete <- read.csv("~/GitHub/azhangproject/popScoreComposerComplete.csv", row.names=1, encoding="UTF-8")
    popScoreComposerComplete <- read.csv("popScoreComposerComplete.csv", row.names=1, encoding="UTF-8")
    popScoreComposerComplete[1:5,1:5]
    

    the top twenty list in the normalized performance frequency score table does not differ much from the composers by season table.

    popScoreSumComp=rowSums(popScoreComposerComplete[2:175],na.rm=TRUE)
    popScoreSumComp=cbind(popScoreComposerComplete[1],popScoreSumComp)
    popScoreSumComp1=popScoreSumComp[order(-popScoreSumComp$popScoreSumComp),]
    head(popScoreSumComp1,20)
    

    require(stringr)
    popScoreComposerComplete$composers=str_replace_all(popScoreComposerComplete$composers,"[^[:graph:]]", " ") 
    popScoreComposerComplete$composers=gsub("  ", " ", popScoreComposerComplete$composers, fixed = TRUE)
    
    composerBySeasonComplete$composers=str_replace_all(composerBySeasonComplete$composers,"[^[:graph:]]", " ") 
    composerBySeasonComplete$composers=gsub("  ", " ", composerBySeasonComplete$composers, fixed = TRUE)
    

    4.Composer Nationalities and Politics and Economy

    art and politics can affect each other. In this part, I want to ask several questions:

    1. As NYC rise to be the center of art and culture, does the number of American composers’ works increase?
    2. Does the number of German composers’ work decrease during WWI and WwII?
    3. Does the number of Russian composers’ work decrease during the cold war?
    4. As the economy rises in Asian and Latin American countries, does the number of works from these areas increase over time?

    To do this we need to identify the nationality of composers whose works are performed by the NY Philharmic. The NY Philharmonic data do not have the nationalities of composers. Therefore, I scraped wikipedia page and got data on composers’ nationalities.
    I got the most of the composers nationality scores by scraping this page and the links in the page: (https://en.wikipedia.org/wiki/Category:Classical_composers_by_nationality) using the following python code

    require(png)
    require(grid)
    img02 <- readPNG("2016-03-26b.png")
    grid.raster(img02)
    

    However, some pages, for example the American composer page (https://en.wikipedia.org/w/index.php?title=Category:American_classical_composers) has multiple pages, and it is hard to go through every page in my code. So I scraped every page by clicking by hand and rbind them together in R

    img03 <- readPNG("scrapeNationality2.png")
    grid.raster(img03)
    

    Because there are many names that are written in different languages that don’t match easily to listings in the NY Phil record and wikipedia pages. I adapted a matching algorithm online to match names on Wikipedia page and NY Phil record. (http://www.r-bloggers.com/merging-data-sets-based-on-partially-matched-data-elements/)

    signature=function(x){
      sig=paste(sort(unlist(strsplit(tolower(x)," "))),collapse='')
      return(sig)
    }
     
    partialMatch=function(x,y,levDist=0.01){
      xx=data.frame(sig=sapply(x, signature),row.names=NULL)
      yy=data.frame(sig=sapply(y, signature),row.names=NULL)
      xx$raw=x
      yy$raw=y
      xx=subset(xx,subset=(sig!=''))
      xy=merge(xx,yy,by='sig',all=T)
      matched=subset(xy,subset=(!(is.na(raw.x)) & !(is.na(raw.y))))
      matched$pass="Duplicate"
      todo=subset(xy,subset=(is.na(raw.y)),select=c(sig,raw.x))
      colnames(todo)=c('sig','raw')
      todo$partials= as.character(sapply(todo$sig, agrep, yy$sig,max.distance = levDist,value=T))
      todo=merge(todo,yy,by.x='partials',by.y='sig')
      partial.matched=subset(todo,subset=(!(is.na(raw.x)) & !(is.na(raw.y))),select=c("sig","raw.x","raw.y"))
      partial.matched$pass="Partial"
      matched=rbind(matched,partial.matched)
      un.matched=subset(todo,subset=(is.na(raw.x)),select=c("sig","raw.x","raw.y"))
      if (nrow(un.matched)>0){
        un.matched$pass="Unmatched"
        matched=rbind(matched,un.matched)
      }
      matched=subset(matched,select=c("raw.x","raw.y","pass"))
     
      return(matched)
    }
    
    

    ####American.
    I stacked the data from multiple pages, cleaned them and matched them with the normalized performance frequency score table and computed the proportion of the number of American composers whose works are performed by the NY Philharmonic over total number of works performed by the NY Philharmonic over time.

    american1=read.csv("americantest1.csv", header = FALSE ,encoding = "UTF-8")
    american2=read.csv("americantest2.csv", header = FALSE ,encoding = "UTF-8")
    american3=read.csv("americantest3.csv", header = FALSE ,encoding = "UTF-8")
    american4=read.csv("americantest4.csv", header = FALSE ,encoding = "UTF-8")
    american5=read.csv("americantest5.csv", header = FALSE ,encoding = "UTF-8")
    american6=read.csv("americantest6.csv", header = FALSE ,encoding = "UTF-8")
    american7=read.csv("americantest7.csv", header = FALSE ,encoding = "UTF-8")
    american=c(american1,american2,american3,american4,american5,american6,american7)
    american=unique(unlist(american))
    
    american1.0=gsub("\\(composer)|\\(pianist)|\\(conductor)|\\(guitarist)|\\(musician)|\\ (musicologist)|\\(singer-songwriter)|\\ (Fluxus musician)","",american)
    american1.1=strsplit(as.character(american1.0)," ")
    
    american1.2=list(rep(0,length(american1.1)))
    for ( i in 1:length(american1.1)){
      if (length(american1.1[[i]])>1)
        american1.2[i]=paste(american1.1[[i]][length(american1.1[[i]])],paste(american1.1[[i]][1:length(american1.1[[i]])-1], collapse=" "),sep=", ")
    }
    american1.2=american1.2[!is.na(american1.2)]
    american1.4=unlist(american1.2)
    american1.4=c(american1.4,"Gershwin, George")
    american1.4=c(american1.4,"Bernstein, Leonard")
    american1.4=c(american1.4,"Foote, Arthur")
    require(ggplot2)
    
    l=list(rep(0, length(american1.4)))
    l=c()
    for ( i in 1:length(american1.4)){
      l=c(l,which(american1.4[i]==popScoreComposerComplete$composers))
    }
    americans=popScoreComposerComplete$composers[l]
    americansPop=popScoreComposerComplete[l,]
    americansPopSum=colSums(americansPop[2:175])
    qplot(seq_along(americansPopSum),americansPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("American Composers")
    

    The graph shows a general increase of the proportion of the number of American composers over total number of composers over time which reinforces the hypothesis that as America rise to become the center of the art and culture of the world during the turn of the century its composers got more recognitions by the NY Philharmonic.

    The top twenty American composers are

    americanTop=rowSums(americansPop[2:175],na.rm=TRUE)
    americanTop=cbind(as.data.frame(americans)[1],americanTop)
    americanTop1=americanTop[order(-americanTop$americanTop),]
    head(americanTop1,20)
    

    ####Germany

    german1=read.csv("germantest1.csv", header = FALSE ,encoding = "UTF-8")
    german2=read.csv("germantest2.csv", header = FALSE ,encoding = "UTF-8")
    german3=read.csv("germantest3.csv", header = FALSE ,encoding = "UTF-8")
    german4=read.csv("germantest4.csv", header = FALSE ,encoding = "UTF-8")
    german5=read.csv("germantest5.csv", header = FALSE ,encoding = "UTF-8")
    
    
    german=c(german1,german2,german3,german4,german5)
    german=unique(unlist(german))
    german1.0=gsub("\\(composer)","",german)
    german1.0=gsub("\\(baroque composer)","",german1.0)
    german1.0=gsub("\\(Altstadt Kantor)","",german1.0)
    german1.0=gsub("\\(Morean)","",german1.0)
    german1.0=gsub("\\(1772???1806)","",german1.0)
    german1.0=gsub("\\(conductor)","",german1.0)
    german1.0=gsub("\\(the elder)","",german1.0)
    german1.0=gsub("\\(the younger)","",german1.0)
    german1.0=gsub("\\(musician)","",german1.0)
    german1.0=gsub("\\(organist)","",german1.0)
    german1.0=gsub("\\(guitarist)","",german1.0)
    german1.0=gsub("\\(musician at Arnstadt)","",german1.0)
    german1.0=gsub("\\(Austrian composer)","",german1.0)
    german1.1=strsplit(as.character(german1.0)," ")
    
    german1.2=list(rep(0,length(german1.1)))
    for ( i in 1:length(german1.1)){
      if (length(german1.1[[i]])>1){
        german1.2[i]=paste(german1.1[[i]][length(german1.1[[i]])],paste(german1.1[[i]][1:length(german1.1[[i]])-1], collapse=" "),sep=", ")
      }
    }
    german1.2=german1.2[!is.na(german1.2)]
    
    test2=partialMatch(popScoreComposerComplete$composers,german1.2)
    test3=test2[-c(126,130,142,141,138),]
    german1.3=test3$raw.x
    save(german1.3,file="germanComps.RData")
    

    load("germanComps.RData")
    
    l=c()
    for ( i in 1:length(german1.3)){
      l=c(l,which(german1.3[i]==popScoreComposerComplete$composers))
    }
    german=popScoreComposerComplete$composers[l]
    germanPop=popScoreComposerComplete[l,]
    germanPopSum=colSums(germanPop[2:175])
    qplot(seq_along(germanPopSum),germanPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("German Composers")
    

    the graph shows a significant decrease in the proportion of German composers’ works being performed during WWI and WWII and after WWII.

    germanTop=rowSums(germanPop[2:175],na.rm=TRUE)
    germanTop=cbind(as.data.frame(german)[1],germanTop)
    germanTop1=germanTop[order(-germanTop$germanTop),]
    head(germanTop1,20)
    

    #####Wagner

    wagner=as.numeric(popScoreComposerComplete[81,2:175])
    qplot(seq_along(wagner),wagner)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Wagner")
    

    The graph shows that the normalized performance frequency score of Hitler’s favorite composer, Wagner, significantly decreased after WWII.

    Russian

    russian1=read.csv("russiantest1.csv", header = FALSE ,encoding = "UTF-8")
    russian2=read.csv("russiantest2.csv", header = FALSE ,encoding = "UTF-8")
    russian=c(russian1,russian2) 
    russian=unique(unlist(russian))
    
    russian1.0=gsub("\\(composer)","",russian)
    russian1.0=gsub("\\(conductor)","",russian1.0)
    russian1.1=strsplit(as.character(russian1.0)," ")
    
    russian1.2=list(rep(0,length(russian1.1)))
    for ( i in 1:length(russian1.1)){
      if (length(russian1.1[[i]])>1)
        russian1.2[i]=paste(russian1.1[[i]][length(russian1.1[[i]])],paste(russian1.1[[i]][1:length(russian1.1[[i]])-1], collapse=" "),sep=", ")
    }
    russian1.2=russian1.2[!is.na(russian1.2)]
    
    test2=partialMatch(popScoreComposerComplete$composers,russian1.2)
    test3=test2[-c(38,35,33,29),]
    russian1.3=test3$raw.x
    save(russian1.3,file="russianComps.RData")
    

    load("russianComps.RData")
    l=c()
    for ( i in 1:length(russian1.3)){
      l=c(l,which(russian1.3[i]==popScoreComposerComplete$composers))
    }
    
    russian=popScoreComposerComplete$composers[l]
    russianPop=popScoreComposerComplete[l,]
    russianPopSum=colSums(russianPop[2:175])
    qplot(seq_along(russianPopSum),russianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Russian Composers")
    

    russianTop=rowSums(russianPop[2:175],na.rm=TRUE)
    russianTop=cbind(as.data.frame(russian)[1],russianTop)
    russianTop1=russianTop[order(-russianTop$russianTop),]
    head(russianTop1,20)
    

    The graph shows that there is an increase in normalized performance frequency score of Russian composers after WWII during the cold war, probably because many important Russian composers rise during that time. This shows that the cold war did not affect the introduction of Russian music to the US.

    In conclusion, overt war and internal censorship may affect cultural performances and people’s attitudes toward music, but vaguer antipathy, as in the Cold War, may not influence the frequency of cultural performances. This is reflected on the choice of NY Phil reportories. During the culture revolution in China, Western art works are strictly prohibited. Censorship affected Chinese music and art institutions’ reportorie choice. Comparing China to the United States, it suggests that, in a democratic society, attitudes and censureship somestimes do not affect art and culture performance much which is shown by the proportion of Russian works being performed increasing during the cold war. However, during actual wartime attitudes do affect art and culture performances, which is shown by the proportion of German composers’ performances diminishing during and after the war years.

    Chinese

    In order to see how the economic rise of Asia and Latin American countries affect the performance history at NY Phil, I needed to come up with a coherent list of Asian and Latin American composers. But I could not find these data. Instead, I used China as a single-country sample to see how the performance trends change over time as the economy of China rose.

    In order to do that, I find a list of common Chinese last names and mathced it with composers’ last names. This matching algorithm finds every composers with Chinese ethnitiy rather than with actual Chinese nationality.

    url <- 'http://www.bloomberg.com/visual-data/best-and-worst//most-common-in-china-surnames'
    html <- read_html(url, encoding = "UTF-8")
    tables <- html_table(html, fill=TRUE)
    tables=tables[[1]]
    lastNames=tables["Pinyin annotation"]
    ChineseLname=unlist(lastNames$`Pinyin annotation`)
    ChineseLname[73]="Dun"
    save(ChineseLname,file="ChineseLastName.RData")
    

    load("ChineseLastName.RData")
    splitname=strsplit(popScoreComposerComplete$composers,",")
    lname=c()
    for ( i in 1:length(splitname)){
      lname=c(lname,splitname[[i]][1])
    }
    
    l=c()
    for ( i in 1:length(ChineseLname)){
       l=c(l,which(ChineseLname[i]==lname))
    }
    
    asianPop=popScoreComposerComplete[l,]
    nrow(asianPop)
    nrow(asianPop)/nrow(popScoreComposerComplete)
    
    asianTop=rowSums(asianPop[2:175],na.rm=TRUE)
    asianTop=cbind(as.data.frame(asianPop)[1],asianTop)
    asianTop1=asianTop[order(-asianTop$asianTop),]
    head(unique(asianTop1),20)
    
    asian=popScoreComposerComplete$composers[l]
    asianPop=popScoreComposerComplete[l,]
    asianPopSum=colSums(asianPop[2:175])
    qplot(seq_along(asianPopSum),asianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Chinese Composers")
    

    The graph shows that as the economy of China rose, the proportion of Chinese composers’ works being performed did not increase significantly over time. I expect the reason to be not only are there not many Chinese composeres but also there are culture communication barriers between China and the United States. As the economy of China develops, there are more and more Chinese musicians as more money and effort is put into music and art education. However, most of them are performers rather than composers. Western music and western music education was introduced to China only after the beginning of the twentieth century, so the history of western music is still relatively short in China. In addition, during the culture revolution, China was again isolated from the rest of the world. Therefore, even though there are good Chinese composers, their works are not introduced to the US.

    ####French
    I also did French and Italian composers performance hisotry graphs over time in order to compare them with MoMA exhibition history data.

    french1=read.csv("frenchtest1.csv", header = FALSE ,encoding = "UTF-8")
    french2=read.csv("frenchtest2.csv", header = FALSE ,encoding = "UTF-8")
    french3=read.csv("frenchtest3.csv", header = FALSE ,encoding = "UTF-8")
    french4=read.csv("frenchtest4.csv", header = FALSE ,encoding = "UTF-8")
    french=c(french1,french2,french3,french4)
    french=unique(unlist(french))
    
    french1.0=gsub("\\(composer)","",french)
    french1.0=gsub("\\(conductor)","",french1.0)
    french1.0=gsub("\\(1907???1970)","",french1.0)
    french1.0=gsub("\\(organist)","",french1.0)
    french1.0=gsub("\\(violist)","",french1.0)
    french1.0=gsub("\\(musician) ","",french1.0)
    french1.0=gsub("\\(Chantilly Codex composer) ","",french1.0)
    french1.0=gsub("\\(lutenist)  ","",french1.0)
    french1.1=strsplit(as.character(french1.0)," ")
    
    french1.2=list(rep(0,length(french1.1)))
    for ( i in 1:length(french1.1)){
      if (length(french1.1[[i]])>1)
        french1.2[i]=paste(french1.1[[i]][length(french1.1[[i]])],paste(french1.1[[i]][1:length(french1.1[[i]])-1], collapse=" "),sep=", ")
    }
    french1.2=french1.2[!is.na(french1.2)]
    
    test2=partialMatch(popScoreComposerComplete$composers,french1.2)
    test3=test2[-c(95,98,90,82,83,86,87,88,90),]
    french1.3=test3$raw.x
    save(french1.3,file="frenchComps.RData")
    

    load("frenchComps.RData")
    l=c()
    for ( i in 1:length(french1.3)){
      l=c(l,which(french1.3[i]==popScoreComposerComplete$composers))
    }
    
    french=popScoreComposerComplete$composers[l]
    frenchPop=popScoreComposerComplete[l,]
    frenchPopSum=colSums(frenchPop[2:175])
    qplot(seq_along(frenchPopSum),frenchPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("French Composers")
    

    frenchTop=rowSums(frenchPop[2:175],na.rm=TRUE)
    frenchTop=cbind(as.data.frame(french)[1],frenchTop)
    frenchTop1=frenchTop[order(-frenchTop$frenchTop),]
    head(frenchTop1,20)
    

    ####Italian

    itallian1=read.csv("italiantest1.csv", header = FALSE ,encoding = "UTF-8")
    itallian2=read.csv("italiantest2.csv", header = FALSE ,encoding = "UTF-8")
    itallian3=read.csv("italiantest3.csv", header = FALSE ,encoding = "UTF-8")
    itallian4=read.csv("italiantest4.csv", header = FALSE ,encoding = "UTF-8")
    itallian5=read.csv("italiantest5.csv", header = FALSE ,encoding = "UTF-8")
    italian=c(itallian1,itallian2,itallian3,itallian4,itallian5)
    italian=unique(unlist(italian))
    
    italian1.0=gsub("\\(composer)","",italian)
    italian1.0=gsub("\\(conductor)","",italian1.0)
    italian1.0=gsub("\\(classical era composer)","",italian1.0)
    italian1.0=gsub("\\ (senior)","",italian1.0)
    italian1.1=strsplit(as.character(italian1.0)," ")
    
    italian1.2=list(rep(0,length(italian1.1)))
    for ( i in 1:length(italian1.1)){
      if (length(italian1.1[[i]])>1)
        italian1.2[i]=paste(italian1.1[[i]][length(italian1.1[[i]])],paste(italian1.1[[i]][1:length(italian1.1[[i]])-1], collapse=" "),sep=", ")
    }
    italian1.2=italian1.2[!is.na(italian1.2)]
    
    test2=partialMatch(popScoreComposerComplete$composers,italian1.2)
    test3=test2[-c(115,114,108,107),]
    italian1.3=test3$raw.x
    save(italian1.3,file="italianComps.RData")
    

    load("italianComps.RData")
    
    l=c()
    for ( i in 1:length(italian1.3)){
      l=c(l,which(italian1.3[i]==popScoreComposerComplete$composers))
    }
    
    italian=popScoreComposerComplete$composers[l]
    italianPop=popScoreComposerComplete[l,]
    italianPopSum=colSums(italianPop[2:175])
    qplot(seq_along(italianPopSum),italianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Italian Composers")
    

    italianTop=rowSums(italianPop[2:175],na.rm=TRUE)
    italianTop=cbind(as.data.frame(italian)[1],italianTop)
    italianTop1=italianTop[order(-italianTop$italianTop),]
    head(italianTop1,20)
    

    ####The status of Women Composers
    The feminist movements accelerated in 1960s. It first starts in political and economic equality between men and women, and spread to the culture sectors. Can we find this reflected in NY Phil performance data?

    I cannot find a comprehensive list of woman composers in the world. I took American composer as a smaple and examined the proportion of American womem composers’ work being performed over time by NY Phil. To do this, I scraped this page (http://names.mongabay.com/female_names.htm) and got a list of common American female first names and matched them with the NY Phil record.

    url <- 'http://names.mongabay.com/female_names.htm'
    html <- read_html(url, encoding = "UTF-8")
    tables <- html_table(html, fill=TRUE)
    tables=tables[[1]]
    femalename=tables[1]
    femalename=femalename[1:500,]
    femalenames=tolower(femalename)
    save(femalenames,file="femalenames.RData")
    

    load("femalenames.RData")
    names=americansPop[1]$composers
    splitName2=strsplit(names,",")
    fname=c()
    for (i in 1:length(splitName2)){
      fname=c(fname,splitName2[[i]][2])
    }
    fname=tolower(fname)
    fname=trimws(fname)
    fname3=strsplit(fname," ")
    fname4=c()
    for (i in 1: length(fname3)){
      fname4=c(fname4,fname3[[i]][1])
    }
    
    
    l=c()
    for ( i in 1:length(femalenames)){
       l=c(l,which(femalenames[i]==fname4))
    }
    
    woman=americansPop[l,1]
    woman
    
    womanTrue=woman[-c(8,15,16,19,22)]
    womanTrue
    
    length(womanTrue)/nrow(americansPop)
    
    womanPop=americansPop[l,]
    womanPop=womanPop[-c(8,15,16,19,22),]
    womanPopSum=colSums(womanPop[2:175])
    qplot(seq_along(womanPopSum),womanPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Americna Women Composers")
    

    There are also some gender neutral names in the female first name list. Thus, some of the people in the list could be male. I removed them by hand. The graph shows that as time changes, the proportion of women composers’ works being performed did not increase significantly over time, which reflects the sad situation of women in classical music.

    Art from MoMA

    To compare with how NY Phil performance history reflects the change of American society, I decide to do a series of MoMA exhibition history graphs by country.

    require(dplyr)
    MoMA=read.csv("MoMA.csv",header=TRUE,encoding = "UTF-8")
    moma.1=MoMA[,c("Nationality","Date")][1:98578,]
    moma.1$Date.1=as.numeric(gsub("([0-9]+).*$", "\\1", moma.1$Date))
    moma.1=na.omit(moma.1)
    moma.1=moma.1[,c("Nationality", "Date.1")]
    write.csv(moma.1,"momaSmall.csv")
    

    moma.1=read.csv("momaSmall.csv",row.names=1)
    moma.1=subset(moma.1, Date.1>=1929)
    
    test=unique(moma.1$Date.1)
    
    test=sort(test)
    tyear=rep(0,length(test))
    soviet=rep(0,length(test))
    american=rep(0,length(test))
    germanAustria=rep(0,length(test))
    french=rep(0,length(test))
    italian=rep(0,length(test))
    asianLatin=rep(0,length(test))
    
    psoviet=rep(0,length(test))
    pamerican=rep(0,length(test))
    pgermanAustria=rep(0,length(test))
    pfrench=rep(0,length(test))
    pitalian=rep(0,length(test))
    pasianLatin=rep(0,length(test))
    
    for ( i in 1:length(test)){
      tyear[i]=unlist(nrow(subset(moma.1,Date.1==test[i])))
      american[i]=length(grep("American",subset(moma.1,Date.1==test[i])$Nationality))+length(grep("USA",subset(moma.1,Date.1==test[i])$Nationality))
      pamerican[i]=as.numeric(american[i])/as.numeric(tyear[i])
      
       soviet[i]=length(grep("Russian",subset(moma.1,Date.1==test[i])$Nationality))
      psoviet[i]=as.numeric(soviet[i])/as.numeric(tyear[i])
      
      germanAustria[i]=length(grep("German",subset(moma.1,Date.1==test[i])$Nationality))
    
      pgermanAustria[i]=as.numeric(germanAustria[i])/as.numeric(tyear[i])
      
      french[i]=length(grep("French",subset(moma.1,Date.1==test[i])$Nationality))
      pfrench[i]=as.numeric(french[i])/as.numeric(tyear[i])
      
      italian[i]=length(grep("Italian",subset(moma.1,Date.1==test[i])$Nationality))
      pitalian[i]=as.numeric(italian[i])/as.numeric(tyear[i])
      
       asianLatin[i]=length(grep("Chinese",subset(moma.1,Date.1==test[i])$Nationality))
      pasianLatin[i]=as.numeric(asianLatin[i])/as.numeric(tyear[i])
    }
    
    qplot(seq_along(pamerican),pamerican)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("American Artists")
    
    qplot(seq_along(pgermanAustria),pgermanAustria)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("German Artists")
    
    qplot(seq_along(psoviet),psoviet)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Russian Artists")
    
    qplot(seq_along(pfrench),pfrench)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("French Artists")
    
    qplot(seq_along(pitalian),pitalian)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Italian Artists")
    

    sum(asianLatin)
    sum(asianLatin)/nrow(moma.1)
    qplot(seq_along(pasianLatin),pasianLatin)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Chinese Artists")
    

    the graphs show that MoMA exhibition history is more sensitive to changes in US social pressures than the NY Philharmoic performance history. For example, during WWII, the exhibitions of German artists’ work at MoMA are very infrequent. But later, before and after Berlin Wall fell when Americans had lots of sympathy for Germans, there is a big peak in the frequency of German artists’ exhibitions. The fluctuation is smaller for the NY Phil composer-frequency data when compared with the peaks in MoMA’s exhibition frequency data.
    This might be because art, as relected by curatorial and exhibition selections, is actually more sensitive to social pressures than are choices of music to perform. Alternatively, it might because MoMA’s exhibits are recent and contemporary while the NY Phil concerts include a much longer history of music and this long history somehoe dilutes the effects of social attitudes. For example, Americans did not hate German music from Beethoven’s era.

    Conclusion

    In this project, I studied the performance history of the NY Philharmonic and analyzed the trends of performance frequency by composer nationality and gender as a function of social attitudes derived from states of war, hostility and censorship. I also compared NY Phil performance data with MoMA exhibition data and found MoMA exhibition data to be even more sensitive to such social attitude pressures. This project tells the story of the NY Philharmonic’s performance history and tries to explain how changes in its repertoire are related to changes in social attitudes in American history. This is my first attempt to bring quantitative analysis to bear on a field in the humanities.

    future work

    I would like to graph some individual NY Phil performer or composer’s performance history to show how he or she rose to stardom over time. Is there a steady rise in the number of performances or are there any up and downs. In addition, I’d like to study the proportion of composers whose works are performed at NY Phil during their own lifetimes. Furthur, I’d like to see if any global art and culture trends like impressionism and popularity of Ballet Russe corresponds NY Phil performance history and MoMA exhibition history. In addition, I do want to point out that in this research I am relying on internet sources esepcially Wikipedia pages for composers’ personal information. I believe that crowd intelligence can be reliable, but because these are not authorized sources, there must be some mistakes in the content. I caught some of them and corrected them by hand, but there might be some other faults in the sources which I did not catch. If I have more time and the resouces, I’d do the same study trying from authenticated sources for composers’ nationalities and women gender and compare it with my study based on wikipeida pages, which can be a way to see how reliable crowd intelligence is.

    Achnolwegement

    I thank Yoav Bergner for introducing me to the wonderful world of data science. I thank Vincent Dorie for teaching me debugging techniques.

    Visit original content creator repository

  • GBA_Memory-Access-Scanner

    GBA_Memory-Access-Scanner

    [ Description —————————]

    This program automates the process of setting watchpoints to detect functions accessing a structure or block of memory.
    It is capable of presenting all detected functions that write and read from a block of memory or structure.
    It detects access types (ldr having a type of 32, strh having a type of 16, ldrb having a type of 8, etc)
    and access offsets (str r0, [r5, 0x35] 0x35, being the offset)

    Through detected access types and offsets, the program can generate a typedef structure template for the structure itself.
    However, correctly estimating the size of a structure is very critical for the generation of the template.
    Underestimating is OK, but overestimating is bad.

    Sometimes, the game may access a memory location inconsistently. This causes problems in the generation
    of a structure template, which generates false structure padding. In such a case, all relevent entries are marked as
    CONFLICT in the structure template output. By fixing these conflicts manually (by choosing only one
    and removing the other duplicates), the template may be input into the StructPadder module to fix the padding.

    [ Protocol ——————————]

    Setting up and running the MemoryAccessDetector.lua in VBA-rr and doing relevent actions to the structure in game
    should generate output that looks like this:

    name=s_02001B80, size=0x84
    080050EC::080050F4 u8(0x00), 08035932::08035938 u8(0x06), 0809F99A::0809F9A0 u8(0x10), 
    0809DEA0::0809DEC0 u8(0x04), 08034EF0::08034EFC u8(0x0E), 08034F68::08034F74 u32(0x18),
    

    The first line contains meta information important to the MemoryAccessProtocol module.
    The next lines contain a repeating pattern of entries that describe a memory access.
    The format is: <function_Address>::< Memory_Access_Address> u<type_of_access>(<Offset_of_access>)
    The program attempts to find the function address by searching for a push {…, lr} somewhere above.
    If it detects a pop {…, pc} first, it indicates that the function address is unkown by placing a ‘?’ in its location.

    [ Usage ———————————-]

    1. Configure the MemoryAccessDetector.lua file by
      1a. setting the base address and the size, and name of the structure.
      1b. setting whether to scan on reads (LDRs) or writes (STRs) or both (or neither, oh well).
    2. Run the script in VBA-rr while playing the relevent game you’re trying to scan.
      2a. Perform actions you think are relevent to the structure to get a better output.
      2b. (By default) Press ‘P’ after you’re done to make sure all memory access entries have been outputted.
    3. Copy the output of the lua script into the file “input”.
    4. Run the MemoryAccesProtocol.py module to generate a structure template in stdout.

    In case the structure template containts CONFLICTS:

    1. Manually go through each conflict, and remove duplicates
      (structure members of the same location yet different types).
    2. (optional): Remove the tag ” CONFLICT” from the entry. so that the only comment is “// loc=0x22” for example.
    3. Copy the content of the template and put it in the “input” file.
      (minus the “typdef struct{” lines and “}structName;” lines)
    4. Run the StructPadder.py module to get correct padding.

    [ Dependencies ——————————]

    1. VBA-rr
    2. Python3
    3. A GBA ROM to scan

    Visit original content creator repository