Author: zlcosa2p5x5i

Robofinder

Robofinder is a powerful Python script designed to search for and retrieve historical robots.txt files from Archive.org for any given website. This tool is ideal for security researchers, web archivists, and penetration testers to uncover previously accessible paths or directories that were listed in a site’s robots.txt.

Features

Fetch historical robots.txt files from Archive.org.
Extract and display old paths or directories that were once disallowed or listed.
Save results to a specified output file.
Silent Mode for unobtrusive execution.
Multi-threading support for faster processing.
Option to concatenate extracted paths with the base URL for easy access.
Debug mode for detailed execution logs.
Extract old parameters from robots.txt files.

Installation

Using `pipx`

Install Robofinder quickly and securely using pipx:

pipx install git+https://github.com/Spix0r/robofinder.git

Manual Installation

To install manually:

git clone https://github.com/Spix0r/robofinder.git
cd robofinder
pip install -r requirements.txt

Usage

Basic Command

If installed via pipx:

robofinder -u https://example.com

For manual installation:

python3 robofinder.py -u https://example.com

Options and Examples

Save output to a file:

robofinder -u https://example.com -o results.txt

Silent Mode (minimal output to console):
```
robofinder -u https://example.com -s
```
Concatenate paths with the base URL:
```
robofinder -u https://example.com -c
```
Extract Paramters:
```
robofinder -u https://example.com -p
```

Enable Debug Mode:

robofinder -u https://example.com --debug

Multi-threading (default: 10 threads):
```
robofinder -u https://example.com -t 10
```

Advanced Usage

Combine options for tailored execution:

robofinder -u https://example.com -t 10 -c -o results.txt -s

Example Output

Running Robofinder on example.com with 10 threads, silent mode, and saving just the paramters to the results.txt:

robofinder -u https://example.com -t 10 -o results.txt -s -p

Contributing

Contributions are highly welcome! If you have ideas for new features, optimizations, or bug fixes, feel free to submit a Pull Request or open an issue on the GitHub repository.

txtnish

A twtxt client with minimal dependencies

Unmaintained

I haven’t used txtnish for a long time now, so it’s probably time to archive this project. Sorry.

Synopsis

$ txtnish follow bob http://example.com/twtxt.txt
$ txtnish tweet 'Hello twtxt world'
$ txtnish timeline

Description

txtnish is a client for twtxt–the
decentralised, minimalist microblogging service for hackers.

Instead of signing up at a closed and/or regulated microblogging platform,
getting your status updates out with twtxt is as easy as putting them in a
publicly accessible text file. The URL pointing to this file is your identity,
your account. twtxt then tracks these text files, like a feedreader, and builds
your unique timeline out of them, depending on which files you track. The
format is simple, human readable, and integrates well with UNIX command line
utilities.

All subcommands of txtnish provide extensive help, so don’t hesitate
to call them with the -h option.

If you are a new user, there is a quickstart command that will ask you some
questions and write a configuration file for you:

$ txtnish quickstart

Installation

txtnish only depends on tools you normally find in a POSIX environment:
awk, sort, cut and sh. There are only two exceptions: you need curl
to download twtxt files and a xargs that support parallel processing
via -P. You can use a xargs without, but then txtnish falls back to
downloading one url after another.

Installation itself is as easy as it gets: just copy the script somewhere
in your PATH.

Subcommands

tweet

Appends a new tweet to your twtxt file. There are three different ways
to input tweets. You can either pipe them into tweet, or pass them along
as arguments. When you call txtnish tweet without any arguments and
it’s not connected to a pipe, it will call $EDITOR for you and tweet
any line as a separate tweet.

timeline

Retrieves your personal timeline.

publish

Publishes your twtfile. This is especially helpful after you changed your
post_tweet_hook.

follow

Adds a new source to your followings.

unfollow

Removes an existing source from your followings.

following

Prints the list of the sources you’re following.

reply

Displays an outcommented version of your timeline in $EDITOR. Every
line that is not commented after you saved and exited the editor, will
be tweeted.

Search tweets

You can provide a search expression to filter your timeline with the flag
-S. The search expression is an awk conditional with four predefined
variables:

msg: the message itself
url: the url of the twtfile
nick: this nick associated with the url
ts: the timestamp of the message

Examples:

txtnish timeline -S 'nick == "mdom" && msg ~ /#twtxt/'

Configuration

At startup txtnish checks whether ~/.config/txtnish/config exists and
will source it if it exists. The configuration file must be a valid
shell script.

General

add_metadata

Add metadata to twtxt file. Default to 0 (false).

awk

Path to the awk binary. Defaults to awk.

sed

Path to the sed binary. Defaults to sed.

limit

How many tweets should be shown in timeline. Defaults to 20.

formatter

Defined which command is used to wrap each tweet to fit on the screen. It
defaults to fold -s.

sort_order

How to sort tweets. This option can be either ascending or
descending. ascending prints the oldest tweet first, descending the
newest. This value can be overridden with the -d and -a flags.

timeout

Maximum time in seconds that each http connection can take. Defaults
to zero.

use_color

If the output should be colorized with ANSI escape sequences. See the
section COLORS on how to change the color settings. Defaults to 1.

pager

Which pager to use if use_pager is enabled. Default to less -R in order
to display colors. This can be toggled with -p or -P to enable or
disable the pager. Defaults to 1.

disclose_identity

If set to 1, send your nick and twturl with every http request. This
makes only sense if you also set twturl and nick. Defaults to 0.

nick

Your nick. This is used to collapse mentions of your twturl and is send to
all feeds you’re following if disclose_identity is set to 1.
Defaults to the environment variable $USER.

twturl

The url of your feeds. This is used to collapse mentions and is send to
all feeds you’re following if disclose_identity is set to 1. Defaults
to the environment variable $USER.

always_update

Always update all feeds before showing tweets. If you set this variable
to 0, you need to update manually with the update command.

http_proxy

Sets the proxy server to use for HTTP.

https_proxy

Sets the proxy server to use for HTTPS.

sign_twtfile

If set to 1, sign the twtfile with pgp. Defaults to 0.

In case you are also overwriting the post_tweet_hook note that this
will create a signed file in a temporary directory and change the value of
twtfile accordingly. Your twtfile will not be changed!

Signing your twtfile might break some twtxt clients as lines without
a TAB are not allowed by a strict reading of the spec.

check_signature

Verify pgp signatures and show the result in the timeline if set to 1. Defaults to 0.

sign_user

Sets a different local user to sign twtfile than what is the default. It will
print a message indicating an override is in place.

gpg_bin

Sets custom name of gpg executable.

ipfs_gateway

When you subscribe to an ipns:// address, txtnish will call this gateway to get
the users twtfile. Defaults to http://localhost:8080 and falls back to
https://ipfs.io if txtnish can’t reach the gateway.

Publish with scp

scp_user

Use the given username to connect to the remote server. Required to publish
with scp.

scp_host

Copy twtfile to this host. Required to publish with scp.

scp_remote_name

Name of twtfile on remote host. Defaults to the basename of the twtfile.

sftp_over_scp

Use SFTP instead of SCP if set to 1.

Publish with ftp

ftp_user

Use the given username to connect to the remote server. Required to publish
with ftp.

ftp_host

Copy twtfile to this host. Required to publish with ftp.

ftp_remote_name

Name of twtfile on remote host. Defaults to the basename of the twtfile.

Publish with IPFS

ipfs_publish

Publish the twtfile with ipfs if set to 1. Defaults to 0.

You will need the ipfs tools and a running daemon to publish to ipfs.

ipfs_wrap_with_dir

Call ipfs add with --wrap-with-dir if set to 1. Defaults to 0.

ipfs_recursive

Call ipfs add with --recursive if set to 1. The complete directory of
your twtfile will be published. Defaults to 0.

Colors

If use_color is set to 1, the nick, timestamp, mentions and hashtags
will be colorized. txtnish recognizes black, red, green, yellow, blue,
magenta, cyan and white. You can set the background color with the prefix
on_.

color_nick="yellow on_white"

Additional a color definiation can specify the attributes bold, bright,
faint, italic, underline, blink and fastblink if your terminal supports
them.

color_nick="yellow on_white blink"

The order of colors and attributes doesn’t matter and multiple attributes can
be combined.

txtnish uses the following defaults.

color_nick=yellow
color_time=blue
color_mention=cyan
color_hashtag=yellow

Hooks

To customize the behaviour of txtnish the user can override functions.

pre_tweet_hook

This hook is called before a new tweet is appended to your twtfile. This can be
useful if you’re using txtnish on multiple devices and want to update your
local twtfile before appending to it. There’s a predefined function
sync_twtfile that does exactly that.

pre_tweet_hook () {
	sync_twtfile
}

post_tweet_hook

post_tweet_hook is called after txtnish has appended new tweets to your
twtfile. It’s a good place to uploade your file somewhere.

post_tweet_hook () {
	gist -u ID -f "$twtfile"
}

filter_tweets_hook

License

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see http://www.gnu.org/licenses/.

remark-admonitions

A remark plugin for admonitions designed with Docusaurus v2 in mind.

remark-admonitions is now included out-of-the-box with @docusaurus/preset-classic!

Installation

remark-admonitions is available on NPM.

npm install remark-admonitions

unified + remark

If you’re using unified/remark, just pass the plugin to use()

For example, this will compile input.md into output.html using remark, rehype, and remark-admonitions.

const unified = require('unified')
const markdown = require('remark-parse')
// require the plugin
const admonitions = require('remark-admonitions')
const remark2rehype = require('remark-rehype')
const doc = require('rehype-document')
const format = require('rehype-format')
const html = require('rehype-stringify')
const vfile = require('to-vfile')
const report = require('vfile-reporter')

const options = {}

unified()
  .use(markdown)
  // add it to unified
  .use(admonitions, options)
  .use(remark2rehype)
  .use(doc)
  .use(format)
  .use(html)
  .process(vfile.readSync('./input.md'), (error, result) => {
      console.error(report(error || result))
      if (result) {
        result.basename = "output.html"
        vfile.writeSync(result)
      }
  })

Docusaurus v2

@docusaurus/preset-classic includes remark-admonitions.

If you aren’t using @docusaurus/preset-classic, remark-admonitions can still be used through passing a remark plugin to MDX.

Usage

Admonitions are a block element. The titles can include inline markdown and the body can include any block markdown except another admonition.

The general syntax is

:::keyword optional title
some content
:::

For example,

:::tip pro tip
`remark-admonitions` is pretty great!
:::

The default keywords are important, tip, note, warning, and danger. Aliases for info => important, success => tip, secondary => note and danger => warning have been added for Infima compatibility.

Options

The plugin can be configured through the options object.

Defaults

const options = {
  customTypes: customTypes, // additional types of admonitions
  tag: string, // the tag to be used for creating admonitions (default ":::")
  icons: "svg"|"emoji"|"none", // the type of icons to use (default "svg")
  infima: boolean, // wether the classes for infima alerts should be added to the markup
}

Custom Types

The customTypes option can be used to add additional types of admonitions. You can set the svg and emoji icons as well as the keyword. You only have to include the svg/emoji fields if you are using them. The ifmClass is only necessary if the infima setting is true and the admonition should use the look of an existing Infima alert class.

const customTypes = {
  [string: keyword]: {
    ifmClass: string,
    keyword: string,
    emoji: string,
    svg: string,
  } | string
}

For example, this will allow you to generate admonitions will the custom keyword.

customTypes: {
  custom: {
    emoji: '💻',
    svg: '<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M15 2H1c-.55 0-1 .45-1 1v9c0 .55.45 1 1 1h5.34c-.25.61-.86 1.39-2.34 2h8c-1.48-.61-2.09-1.39-2.34-2H15c.55 0 1-.45 1-1V3c0-.55-.45-1-1-1zm0 9H1V3h14v8z"></path></svg>'
  }
}

To create an alias for an existing type, have the value be the keyword the alias should point to.

customTypes: {
  alias: "custom"
}

The generated markup will include the class admonition-{keyword} for styling.

If the infima option is true, the classes alert alert--{type} will be added to inherit the default Infima styling.

Styling

You’ll have to add styles for the admonitions. With Docusaurus, these can be added to custom.css.

Infima (Docusaurus v2)

The Infima theme (styles/infima.css) is used by @docusaurus/preset-classic.

Classic (Docusaurus v1)

The classic theme (styles/classic.css) replicates the look of remarkable-admonitions and Docusaurus v1.

Credit

Syntax and classic theme based on remarkable-admonitions.

The SVG icons included are from GitHub Octicons.

About stdlib…

We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we’ve built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.

The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.

When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.

To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!

spttrf

Compute the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A.

Installation

npm install @stdlib/lapack-base-spttrf

Alternatively,

To load the package in a website via a script tag without installation and bundlers, use the ES Module available on the esm branch (see README).
If you are using Deno, visit the deno branch (see README for usage intructions).
For use in Observable, or in browser/node environments, use the Universal Module Definition (UMD) build available on the umd branch (see README).

The branches.md file summarizes the available branches and displays a diagram illustrating their relationships.

To view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.

Usage

var spttrf = require( '@stdlib/lapack-base-spttrf' );

spttrf( N, D, E )

Computes the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A.

var Float32Array = require( '@stdlib/array-float32' );

var D = new Float32Array( [ 4.0, 5.0, 6.0 ] );
var E = new Float32Array( [ 1.0, 2.0 ] );

spttrf( 3, D, E );
// D => <Float32Array>[ 4, 4.75, ~5.15789 ]
// E => <Float32Array>[ 0.25, ~0.4210 ]

The function has the following parameters:

N: order of matrix A.
D: the N diagonal elements of A as a Float32Array.
E: the N-1 subdiagonal elements of A as a Float32Array.

Note that indexing is relative to the first index. To introduce an offset, use typed array views.

var Float32Array = require( '@stdlib/array-float32' );

// Initial arrays...
var D0 = new Float32Array( [ 0.0, 4.0, 5.0, 6.0 ] );
var E0 = new Float32Array( [ 0.0, 1.0, 2.0 ] );

// Create offset views...
var D1 = new Float32Array( D0.buffer, D0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
var E1 = new Float32Array( E0.buffer, E0.BYTES_PER_ELEMENT*1 ); // start at 2nd element

spttrf( 3, D1, E1 );
// D0 => <Float32Array>[ 0.0, 4.0, 4.75, ~5.15789 ]
// E0 => <Float32Array>[ 0.0, 0.25, ~0.4210 ]

spttrf.ndarray( N, D, strideD, offsetD, E, strideE, offsetE )

Computes the L * D * L^T factorization of a real symmetric positive definite tridiagonal matrix A using alternative indexing semantics.

var Float32Array = require( '@stdlib/array-float32' );

var D = new Float32Array( [ 4.0, 5.0, 6.0 ] );
var E = new Float32Array( [ 1.0, 2.0 ] );

spttrf.ndarray( 3, D, 1, 0, E, 1, 0 );
// D => <Float32Array>[ 4, 4.75, ~5.15789 ]
// E => <Float32Array>[ 0.25, ~0.4210 ]

The function has the following additional parameters:

strideD: stride length for D.
offsetD: starting index for D.
strideE: stride length for E.
offsetE: starting index for E.

While typed array views mandate a view offset based on the underlying buffer, the offset parameters support indexing semantics based on starting indices. For example,

var Float32Array = require( '@stdlib/array-float32' );

var D = new Float32Array( [ 0.0, 4.0, 5.0, 6.0 ] );
var E = new Float32Array( [ 0.0, 1.0, 2.0 ] );

spttrf.ndarray( 3, D, 1, 1, E, 1, 1 );
// D => <Float32Array>[ 0.0, 4.0, 4.75, ~5.15789 ]
// E => <Float32Array>[ 0.0, 0.25, ~0.4210 ]

Notes

Both functions mutate the input arrays D and E.
Both functions return a status code indicating success or failure. A status code indicates the following conditions:
- 0: factorization was successful.
- <0: the k-th argument had an illegal value, where -k equals the status code value.
- 0 < k < N: the leading principal minor of order k is not positive and factorization could not be completed, where k equals the status code value.
- N: the leading principal minor of order N is not positive, and factorization was completed.
spttrf() corresponds to the LAPACK routine spttrf.

Examples

var discreteUniform = require( '@stdlib/random-array-discrete-uniform' );
var spttrf = require( '@stdlib/lapack-base-spttrf' );

var opts = {
    'dtype': 'float32'
};
var D = discreteUniform( 5, 1, 5, opts );
console.log( D );

var E = discreteUniform( D.length-1, 1, 5, opts );
console.log( E );

// Perform the `L * D * L^T` factorization:
var info = spttrf( D.length, D, E );
console.log( D );
console.log( E );
console.log( info );

C APIs

Usage

TODO

TODO

TODO.

TODO

TODO

TODO

Examples

TODO

Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

License

See LICENSE.

Copyright

ACM TOMS Algorithm 905: SHEPPACK: Modified Shepard Algorithm for Interpolation of Scattered Multivariate Data

SHEPPACK is a Fortran 95 package containing five versions of the modified Shepard algorithm: quadratic (Fortran 95 translations of Algorithms 660, 661, and 798), cubic (Fortran 95 translation of Algorithm 791), and linear variations of the original Shepard algorithm. An option to the linear Shepard code is a statistically robust fit, intended to be used when the data is known to contain outliers. SHEPPACK also includes a hybrid robust piecewise linear estimation algorithm RIPPLE (residual initiated polynomial-time piecewise linear estimation) intended for data from piecewise linear functions in arbitrary dimension m. The main goal of SHEPPACK is to provide users with a single consistent package containing most existing polynomial variations of Shepard’s algorithm. The algorithms target data of different dimensions. The linear Shepard algorithm, robust linear Shepard algorithm, and RIPPLE are the only algorithms in the package that are applicable to arbitrary dimensional data.

This code has been re-uploaded with the permission of Drs. William Thacker
and Layne Watson.
All comments and questions should be directed to them (see contact info at
the bottom of this file).

Organizational Details

The original source code, exactly as distributed by ACM TOMS, is included in
the src directory.
The src directory also contains its own README and build instructions.
Comments at the top of each subroutine document their proper usage.

Several minor modifications to the contents of src have been made:

The included Makefile has been slightly modified to run all tests
when the make all command is run
All file extensions have been changed form .f95 to .f90 for
compiler compatibility reasons.

Reference and Contact

To cite this work, use:

    @article{alg905,
        author = {Thacker, William I. and Zhang, Jingwei and Watson, Layne T. and Birch, Jeffrey B. and Iyer, Manjula A. and Berry, Michael W.},
        title = {{Algorithm 905: SHEPPACK}: Modified {S}hepard Algorithm for Interpolation of Scattered Multivariate Data},
        year = {2010},
        volume = {37},
        number = {3},
        journal = {ACM Trans. Math. Softw.},
        articleno = {34},
        numpages = {20},
        doi = {10.1145/1824801.1824812}
    }

Inquiries should be directed to

William I. Thacker,
Department of Computer Science, Winthrop University,
Rock Hill, SC 29733;
wthacker@winthrop.edu

Layne T. Watson,
Department of Computer Science, VPI&SU,
Blacksburg, VA 24061-0106;
(540) 231-7540;
ltw@vt.edu

XmlExtractor

The XmlExtractor is a class that will parse XML very efficiently with the XMLReader object and produce an object (or array) for every item desired. This class can be used to read very large (read GB) XML files

How to Use

Given the XML file below:

<root>
	<item>
		<tag1>Value 1</tag1>
		<tag2>
			<subtag1>Sub Value 2</subtag1>
		</tag2>
	</item>
</root>

this is the pattern you would use to parse XML with XmlExtractor:

$source = new XmlExtractor("root/item", "/path/to/file.xml");
foreach ($source as $item) {
  echo $item->tag1;
  echo $item->tag2->subtag1;
}

Options

There are four parameters you can pass the constructor

XmlExtractor($rootTags, $filename, $returnArray, $mergeAttributes)

$rootTags Specify how deep to go into the structure before extracting objects. Examples are below
$filename Path to the XML file you want to parse. This is optional as you can pass an XML string with loadXml() method
$returnArray If true, every iteration will return items as an associative array. Default is false
$mergeAttributes If true, any attributes on extracted tags will be included in the returned record as additional tags. Examples below

Methods

XmlExtractor.loadXml($xml)

Loads XML structure from a php string

XmlExtractor.getRootTags()

This will return the skipped root tags as objects as soon as they are available

XmlItem.export($mergeAttributes = false)

Convert this XML record into an array. If $mergeAttributes is true, any attributes are merged into the array returned

XmlItem.getAttribute($name)

Returns the record’s named attribute

XmlItem.getAttributes()

Returns this record’s attributes if any

XmlItem.mergeAttributes($unsetAttributes = false)

Merges the record’s attributes with the rest of the tags so they are accessible as regular tags. If unsetAttributes is true, the internal attribute object will be removed

Examples

Iterating over XML items

Simple XML structure and straight forward php.

<earth>
	<people>
		<person>
			<name>
				<first>Paul</first>
				<last>Warelis</last>
			</name>
			<gender>Male</gender>
			<skill>Javascript</skill>
			<skill>PHP</skill>
			<skill>Beer</skill>
		</person>
	</people>
</earth>

$source = new XmlExtractor("earth/people/person", "/path/to/above.xml");
foreach ($source as $person) {
  echo $person->name->first; // Paul
  echo $person->gender; // Male
  foreach ($person->skill as $skill) {
    echo $skill;
  }
  $record = $person->export();
}

The first constructor argument is a slash separated tag list that communicates to XmlExtractor that you want to extract “person” records (last tag entry) from earth -> people structure.
The export method on the $person object returns it in array form, which will look like this:

array(
  'name' => array(
    'first' => 'Paul',
    'last' => 'Warelis'
  ),
  'gender' => 'Male'
  'skill' => array(
    '0' => 'Javascript',
    '1' => 'PHP',
    '2' => 'Beer'
  )
)

It’s important to note that the repeating tag “skill” turned into an array.

Loading XML from a string

First create the extractor and then use loadXml() method to get the data in.

$xml = <<<XML
<house>
	<room>
		<corner location="NW"/>
		<corner location="SW"/>
		<corner location="SE"/>
		<corner location="NE"/>
	</room>
</house>
XML;

$source = new XmlExtractor("house/room");
$source->loadXml($xml);
foreach ($source as $room) {
	var_dump($room->export());
	var_dump($room->export(true));
}

The first dump will show the “corner” field that contains four empty values:

array(
  'corner' => array(
    '0' => '',
    '1' => '',
    '2' => '',
    '3' => ''
  )
)

But when you merge the attributes with the tag data, the array changes to:

array(
  'corner' => array(
    '0' => array( "location" => "NW"),
    '1' => array( "location" => "SW"),
    '2' => array( "location" => "SE"),
    '3' => array( "location" => "NE")
  )
)

Dealing with attributes

This example demonstrates how to deal with attributes.

<office address="123 Main Street">
	<items total="2">
		<item name="desk">
			<size width="120" height="33" length="70">large</size>
			<image>desk.png</image>
		</item>
		<item image="cubicle.jpg">
			<name>cubicle</name>
			<size>
				<width>120</width>
				<height>33</height>
				<length>60</length>
				<size>large</size>
			</size>
		</item>
	</items>
</office>

There are a number of things going on with the above XML.
The two root tags that we have to skip to get to our items have information attached.
We can get at these with the getRootTags() method. The next issue is that both items are using attributes to define their data.
This example is a bit contrived, but it will show the functionality behind the mergeAttributes feature.
By the end of this example, we will have two items with identical structure.

$office = new XmlExtractor("office/items/item", "/path/to/above.xml");
foreach ($office as $item) {
  $compressed = $item->export(true); // true = merge attributes into the item
  var_dump($compressed);
}
foreach ($office->getRootTags() as $name => $tag) {
  echo "Tag name: {$name}";
  var_dump($tag->getAttributes());
}

Once “compressed” (exported with merged attributes) the structure of both items is the same.
In the event of an attribute having the same name as the tag, the tag takes precedence and is never overwritten.
The two items will end up looking like this:

array(
  'name' => 'desk',
  'size' => array(
    'width' => '120',
    'height' => '33',
    'length' => '70',
    'size' => 'large'
  ),
  'image' => 'desk.png'
)
array(
  'image' => 'cubicle.jpg'
  'name' => 'cubicle',
  'size' => array(
    'width' => '120',
    'height' => '33',
    'length' => '70',
    'size' => 'large'
  )
)

The root tags bit will come up with this:

Tag name: office
array(
  'address' => '123 Main Street'
)
Tag name: items
array(
  'total' => '2'
)

Using Wildcards (*)

If your XML file has markup like this:

<art>
	<painting>
		<name>Mona Lisa</name>
	</painting>
	<sculpture>
		<name>Dying Gaul</name>
	</sculpture>
	<photo>
		<name>Afghan Girl</name>
	</photo>
</art>

The art tag contains many different items. To parse them, do this (notice the path to the tag):

$art = new XmlExtractor("art/*", "/path/to/above.xml");
foreach ($art as $name => $piece) {
  echo "Piece : " . $piece->getName();
  var_dump($piece->export());
}

The output would be something like this:

Piece : painting
array('name' => 'Mona Lisa')
Piece : sculpture
array('name' => 'Dying Gaul')
Piece : photo
array('name' => 'Afghan Girl')

If you find bugs, post an issue. I will correct or educate.

Enjoy!

Contact

pwarelis at gmail dot com

title	output
Reflections on NY Phil — The NY Phil as a lens on changes in US society	html_document

Around the turn of the century, New York City became the arts center of the world. Its establishment not only encouraged the flourishing of American musicians but also attracted musicians from all over the world to NYC. NY Philharmonic as an important art and culture institution, reflects the social and economic changes of the United States society over time. In this study I focus on NY Philharmonic data from three perspectives: 1. the nationality of composers whose works are performed by NY Philharmonic in relation to the political enviroments of the US; 2. the status of women composers over time; 3. the elasticity of an art and culture institute’s reaction to social issues by comparing NY Phil performance data and MoMA exhibition data.

###1. Getting data from NY Philharmonic’s github page
First of all, I read the XML file from NY Philharmonic’s github page (https://github.com/nyphilarchive/PerformanceHistory/blob/master/Programs/complete.xml) and found the number of every composers whose work was performed for all seasons and put them in a table.

require("XML")
require(mosaic)
xmlfile <- xmlParse("complete.xml",encoding="UTF-8")
rootnode = xmlRoot(xmlfile) #gives content of root

incrementComp <- function(composer_stats, c, season){
  if (is.null(composer_stats[c, season])) {
    composer_stats[c, season] <- 1
  } else if (is.na(composer_stats[c,season])) {
    composer_stats[c, season] <- 1
  } else {
    composer_stats[c, season] <- composer_stats[c, season] + 1
  }
  return(composer_stats)
}

composerBySeasonComplete <- data.frame()
for (seas in 1:xmlSize(rootnode)) {
  # DEBUG: cat(seas, "\n")
  firstlist <- xmlToList(rootnode[[seas]])
  season <- firstlist$season
  season <- paste("Season",season,sep=".")
  works <- firstlist$worksInfo
  if (is.list(works)) {     # sometimes works is actually empty
      for (i in 1:length(works)) {
        if (!is.null(works[[i]]$composerName)) {    #sometimes there is no composer
          composerBySeasonComplete <- incrementComp(composerBySeasonComplete, works[[i]]$composerName,season)
        }
      }
    }
}
colnames(composerBySeasonComplete)[1]="composers"
write.csv(composerBySeasonComplete, "composerBySeasonComplete.csv")

the cleaned data look like:

composerBySeasonComplete <- read.csv("composerBySeasonComplete.csv", row.names=1, encoding="UTF-8")
composerBySeasonComplete[1:5,1:5]

To get a general sense of the data, I ordered composers by the number of works performed in descending order.

SumComp=rowSums(composerBySeasonComplete[2:175],na.rm=TRUE)
SumComp=cbind(composerBySeasonComplete[1],SumComp)
SumComp1=SumComp[order(-SumComp$SumComp),]

The following graph shows that most of the composers’ works got performed fewer than ten times, and only 16 composers’ works are performed more than 1000 times. Therefore, I expect the composers to be diverse.

require(mosaic)
nrow(SumComp1)
hist(SumComp1$SumComp,main="number of performance histogram",xlab="number of performance")

comp1000=subset(SumComp1,SumComp>=1000)
nrow(comp1000)
comp1000

compl1000=subset(SumComp1,SumComp<=10)
nrow(compl1000)
hist(compl1000$SumComp,main="number of performance histogram",xlab="number of performance")

2. Number of Performance per year and economics

Graph of number of performance per year

SumSeas=colSums(composerBySeasonComplete[2:175],na.rm=TRUE)
require(ggplot2)
qplot(seq_along(as.double(SumSeas)),as.double(SumSeas))+geom_line()+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("number of performance")

GDP anuual rate of change
http://www.multpl.com/us-gdp-growth-rate

According to Marx, economics base determines the superstructure of the society, which is reflected as the economic development level determines the politics, art and culture activity of a society. Originally, I was thinking of studying the relationship between the number of contemporary composers’ works performed at NY Philharmonic and the GDP growth rate to see how the number of contemporary composers work performed reflects society’s emphasis on art and music education. But the list of composers’ birth and death year is incomplete. Therefore I cannot determine which composers are alive at the time their works are performed by the NY Phil. Thus in order to see the relationship between US economic development and NY Phil performances, I decided to study the relationship between the number of concerts in each season and US GDP growth rate. The graph shows that GDP growth rate and performance don’t have similar patterns. However, from a micro perspective, the number of performance per year reflects the NY Phil’s own economic condition. For example, the boom of the number of performance at the beginning of the twentieth century is explained by recognizing that several orchestras merged.

3.Normalized Performance Frequency Score

Because the number of performance change year by year, I computed a “Normalized Performance Frequency Score” to normalize by total number of performances. I got the normalized performance frequency score by dividing the number of performances for each composers in each season by the total number of performance in each season.

require(base)
composerBySeasonComplete[is.na(composerBySeasonComplete)] <- 0
composerBySeasonComplete1=composerBySeasonComplete[2:175]
composerBySeasonComplete2=composerBySeasonComplete[1]
popScoreComposerComplete=data.frame()
totalNumConcert=colSums(composerBySeasonComplete1, na.rm=TRUE)
for ( i in 1:2652){
  popScoreComposerComplete[i,]=composerBySeasonComplete1[i,]/totalNumConcert
  i=i+1
}
popScoreComposerComplete=cbind(composerBySeasonComplete2,popScoreComposerComplete)
write.csv(popScoreComposerComplete,"popScoreComposerComplete.csv")

the Normalized Performance Frequency Score table looks like:

#popScoreComposerComplete <- read.csv("~/GitHub/azhangproject/popScoreComposerComplete.csv", row.names=1, encoding="UTF-8")
popScoreComposerComplete <- read.csv("popScoreComposerComplete.csv", row.names=1, encoding="UTF-8")
popScoreComposerComplete[1:5,1:5]

the top twenty list in the normalized performance frequency score table does not differ much from the composers by season table.

popScoreSumComp=rowSums(popScoreComposerComplete[2:175],na.rm=TRUE)
popScoreSumComp=cbind(popScoreComposerComplete[1],popScoreSumComp)
popScoreSumComp1=popScoreSumComp[order(-popScoreSumComp$popScoreSumComp),]
head(popScoreSumComp1,20)

require(stringr)
popScoreComposerComplete$composers=str_replace_all(popScoreComposerComplete$composers,"[^[:graph:]]", " ") 
popScoreComposerComplete$composers=gsub("  ", " ", popScoreComposerComplete$composers, fixed = TRUE)

composerBySeasonComplete$composers=str_replace_all(composerBySeasonComplete$composers,"[^[:graph:]]", " ") 
composerBySeasonComplete$composers=gsub("  ", " ", composerBySeasonComplete$composers, fixed = TRUE)

4.Composer Nationalities and Politics and Economy

art and politics can affect each other. In this part, I want to ask several questions:

As NYC rise to be the center of art and culture, does the number of American composers’ works increase?
Does the number of German composers’ work decrease during WWI and WwII?
Does the number of Russian composers’ work decrease during the cold war?
As the economy rises in Asian and Latin American countries, does the number of works from these areas increase over time?

To do this we need to identify the nationality of composers whose works are performed by the NY Philharmic. The NY Philharmonic data do not have the nationalities of composers. Therefore, I scraped wikipedia page and got data on composers’ nationalities.
I got the most of the composers nationality scores by scraping this page and the links in the page: (https://en.wikipedia.org/wiki/Category:Classical_composers_by_nationality) using the following python code

require(png)
require(grid)
img02 <- readPNG("2016-03-26b.png")
grid.raster(img02)

However, some pages, for example the American composer page (https://en.wikipedia.org/w/index.php?title=Category:American_classical_composers) has multiple pages, and it is hard to go through every page in my code. So I scraped every page by clicking by hand and rbind them together in R

img03 <- readPNG("scrapeNationality2.png")
grid.raster(img03)

Because there are many names that are written in different languages that don’t match easily to listings in the NY Phil record and wikipedia pages. I adapted a matching algorithm online to match names on Wikipedia page and NY Phil record. (http://www.r-bloggers.com/merging-data-sets-based-on-partially-matched-data-elements/)

signature=function(x){
  sig=paste(sort(unlist(strsplit(tolower(x)," "))),collapse='')
  return(sig)
}
 
partialMatch=function(x,y,levDist=0.01){
  xx=data.frame(sig=sapply(x, signature),row.names=NULL)
  yy=data.frame(sig=sapply(y, signature),row.names=NULL)
  xx$raw=x
  yy$raw=y
  xx=subset(xx,subset=(sig!=''))
  xy=merge(xx,yy,by='sig',all=T)
  matched=subset(xy,subset=(!(is.na(raw.x)) & !(is.na(raw.y))))
  matched$pass="Duplicate"
  todo=subset(xy,subset=(is.na(raw.y)),select=c(sig,raw.x))
  colnames(todo)=c('sig','raw')
  todo$partials= as.character(sapply(todo$sig, agrep, yy$sig,max.distance = levDist,value=T))
  todo=merge(todo,yy,by.x='partials',by.y='sig')
  partial.matched=subset(todo,subset=(!(is.na(raw.x)) & !(is.na(raw.y))),select=c("sig","raw.x","raw.y"))
  partial.matched$pass="Partial"
  matched=rbind(matched,partial.matched)
  un.matched=subset(todo,subset=(is.na(raw.x)),select=c("sig","raw.x","raw.y"))
  if (nrow(un.matched)>0){
    un.matched$pass="Unmatched"
    matched=rbind(matched,un.matched)
  }
  matched=subset(matched,select=c("raw.x","raw.y","pass"))
 
  return(matched)
}

####American.
I stacked the data from multiple pages, cleaned them and matched them with the normalized performance frequency score table and computed the proportion of the number of American composers whose works are performed by the NY Philharmonic over total number of works performed by the NY Philharmonic over time.

american1=read.csv("americantest1.csv", header = FALSE ,encoding = "UTF-8")
american2=read.csv("americantest2.csv", header = FALSE ,encoding = "UTF-8")
american3=read.csv("americantest3.csv", header = FALSE ,encoding = "UTF-8")
american4=read.csv("americantest4.csv", header = FALSE ,encoding = "UTF-8")
american5=read.csv("americantest5.csv", header = FALSE ,encoding = "UTF-8")
american6=read.csv("americantest6.csv", header = FALSE ,encoding = "UTF-8")
american7=read.csv("americantest7.csv", header = FALSE ,encoding = "UTF-8")
american=c(american1,american2,american3,american4,american5,american6,american7)
american=unique(unlist(american))

american1.0=gsub("\\(composer)|\\(pianist)|\\(conductor)|\\(guitarist)|\\(musician)|\\ (musicologist)|\\(singer-songwriter)|\\ (Fluxus musician)","",american)
american1.1=strsplit(as.character(american1.0)," ")

american1.2=list(rep(0,length(american1.1)))
for ( i in 1:length(american1.1)){
  if (length(american1.1[[i]])>1)
    american1.2[i]=paste(american1.1[[i]][length(american1.1[[i]])],paste(american1.1[[i]][1:length(american1.1[[i]])-1], collapse=" "),sep=", ")
}
american1.2=american1.2[!is.na(american1.2)]
american1.4=unlist(american1.2)
american1.4=c(american1.4,"Gershwin, George")
american1.4=c(american1.4,"Bernstein, Leonard")
american1.4=c(american1.4,"Foote, Arthur")
require(ggplot2)

l=list(rep(0, length(american1.4)))
l=c()
for ( i in 1:length(american1.4)){
  l=c(l,which(american1.4[i]==popScoreComposerComplete$composers))
}
americans=popScoreComposerComplete$composers[l]
americansPop=popScoreComposerComplete[l,]
americansPopSum=colSums(americansPop[2:175])
qplot(seq_along(americansPopSum),americansPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("American Composers")

The graph shows a general increase of the proportion of the number of American composers over total number of composers over time which reinforces the hypothesis that as America rise to become the center of the art and culture of the world during the turn of the century its composers got more recognitions by the NY Philharmonic.

The top twenty American composers are

americanTop=rowSums(americansPop[2:175],na.rm=TRUE)
americanTop=cbind(as.data.frame(americans)[1],americanTop)
americanTop1=americanTop[order(-americanTop$americanTop),]
head(americanTop1,20)

####Germany

german1=read.csv("germantest1.csv", header = FALSE ,encoding = "UTF-8")
german2=read.csv("germantest2.csv", header = FALSE ,encoding = "UTF-8")
german3=read.csv("germantest3.csv", header = FALSE ,encoding = "UTF-8")
german4=read.csv("germantest4.csv", header = FALSE ,encoding = "UTF-8")
german5=read.csv("germantest5.csv", header = FALSE ,encoding = "UTF-8")


german=c(german1,german2,german3,german4,german5)
german=unique(unlist(german))
german1.0=gsub("\\(composer)","",german)
german1.0=gsub("\\(baroque composer)","",german1.0)
german1.0=gsub("\\(Altstadt Kantor)","",german1.0)
german1.0=gsub("\\(Morean)","",german1.0)
german1.0=gsub("\\(1772???1806)","",german1.0)
german1.0=gsub("\\(conductor)","",german1.0)
german1.0=gsub("\\(the elder)","",german1.0)
german1.0=gsub("\\(the younger)","",german1.0)
german1.0=gsub("\\(musician)","",german1.0)
german1.0=gsub("\\(organist)","",german1.0)
german1.0=gsub("\\(guitarist)","",german1.0)
german1.0=gsub("\\(musician at Arnstadt)","",german1.0)
german1.0=gsub("\\(Austrian composer)","",german1.0)
german1.1=strsplit(as.character(german1.0)," ")

german1.2=list(rep(0,length(german1.1)))
for ( i in 1:length(german1.1)){
  if (length(german1.1[[i]])>1){
    german1.2[i]=paste(german1.1[[i]][length(german1.1[[i]])],paste(german1.1[[i]][1:length(german1.1[[i]])-1], collapse=" "),sep=", ")
  }
}
german1.2=german1.2[!is.na(german1.2)]

test2=partialMatch(popScoreComposerComplete$composers,german1.2)
test3=test2[-c(126,130,142,141,138),]
german1.3=test3$raw.x
save(german1.3,file="germanComps.RData")

load("germanComps.RData")

l=c()
for ( i in 1:length(german1.3)){
  l=c(l,which(german1.3[i]==popScoreComposerComplete$composers))
}
german=popScoreComposerComplete$composers[l]
germanPop=popScoreComposerComplete[l,]
germanPopSum=colSums(germanPop[2:175])
qplot(seq_along(germanPopSum),germanPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("German Composers")

the graph shows a significant decrease in the proportion of German composers’ works being performed during WWI and WWII and after WWII.

germanTop=rowSums(germanPop[2:175],na.rm=TRUE)
germanTop=cbind(as.data.frame(german)[1],germanTop)
germanTop1=germanTop[order(-germanTop$germanTop),]
head(germanTop1,20)

#####Wagner

wagner=as.numeric(popScoreComposerComplete[81,2:175])
qplot(seq_along(wagner),wagner)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Wagner")

The graph shows that the normalized performance frequency score of Hitler’s favorite composer, Wagner, significantly decreased after WWII.

Russian

russian1=read.csv("russiantest1.csv", header = FALSE ,encoding = "UTF-8")
russian2=read.csv("russiantest2.csv", header = FALSE ,encoding = "UTF-8")
russian=c(russian1,russian2) 
russian=unique(unlist(russian))

russian1.0=gsub("\\(composer)","",russian)
russian1.0=gsub("\\(conductor)","",russian1.0)
russian1.1=strsplit(as.character(russian1.0)," ")

russian1.2=list(rep(0,length(russian1.1)))
for ( i in 1:length(russian1.1)){
  if (length(russian1.1[[i]])>1)
    russian1.2[i]=paste(russian1.1[[i]][length(russian1.1[[i]])],paste(russian1.1[[i]][1:length(russian1.1[[i]])-1], collapse=" "),sep=", ")
}
russian1.2=russian1.2[!is.na(russian1.2)]

test2=partialMatch(popScoreComposerComplete$composers,russian1.2)
test3=test2[-c(38,35,33,29),]
russian1.3=test3$raw.x
save(russian1.3,file="russianComps.RData")

load("russianComps.RData")
l=c()
for ( i in 1:length(russian1.3)){
  l=c(l,which(russian1.3[i]==popScoreComposerComplete$composers))
}

russian=popScoreComposerComplete$composers[l]
russianPop=popScoreComposerComplete[l,]
russianPopSum=colSums(russianPop[2:175])
qplot(seq_along(russianPopSum),russianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Russian Composers")

russianTop=rowSums(russianPop[2:175],na.rm=TRUE)
russianTop=cbind(as.data.frame(russian)[1],russianTop)
russianTop1=russianTop[order(-russianTop$russianTop),]
head(russianTop1,20)

The graph shows that there is an increase in normalized performance frequency score of Russian composers after WWII during the cold war, probably because many important Russian composers rise during that time. This shows that the cold war did not affect the introduction of Russian music to the US.

In conclusion, overt war and internal censorship may affect cultural performances and people’s attitudes toward music, but vaguer antipathy, as in the Cold War, may not influence the frequency of cultural performances. This is reflected on the choice of NY Phil reportories. During the culture revolution in China, Western art works are strictly prohibited. Censorship affected Chinese music and art institutions’ reportorie choice. Comparing China to the United States, it suggests that, in a democratic society, attitudes and censureship somestimes do not affect art and culture performance much which is shown by the proportion of Russian works being performed increasing during the cold war. However, during actual wartime attitudes do affect art and culture performances, which is shown by the proportion of German composers’ performances diminishing during and after the war years.

Chinese

In order to see how the economic rise of Asia and Latin American countries affect the performance history at NY Phil, I needed to come up with a coherent list of Asian and Latin American composers. But I could not find these data. Instead, I used China as a single-country sample to see how the performance trends change over time as the economy of China rose.

In order to do that, I find a list of common Chinese last names and mathced it with composers’ last names. This matching algorithm finds every composers with Chinese ethnitiy rather than with actual Chinese nationality.

url <- 'http://www.bloomberg.com/visual-data/best-and-worst//most-common-in-china-surnames'
html <- read_html(url, encoding = "UTF-8")
tables <- html_table(html, fill=TRUE)
tables=tables[[1]]
lastNames=tables["Pinyin annotation"]
ChineseLname=unlist(lastNames$`Pinyin annotation`)
ChineseLname[73]="Dun"
save(ChineseLname,file="ChineseLastName.RData")

load("ChineseLastName.RData")
splitname=strsplit(popScoreComposerComplete$composers,",")
lname=c()
for ( i in 1:length(splitname)){
  lname=c(lname,splitname[[i]][1])
}

l=c()
for ( i in 1:length(ChineseLname)){
   l=c(l,which(ChineseLname[i]==lname))
}

asianPop=popScoreComposerComplete[l,]
nrow(asianPop)
nrow(asianPop)/nrow(popScoreComposerComplete)

asianTop=rowSums(asianPop[2:175],na.rm=TRUE)
asianTop=cbind(as.data.frame(asianPop)[1],asianTop)
asianTop1=asianTop[order(-asianTop$asianTop),]
head(unique(asianTop1),20)

asian=popScoreComposerComplete$composers[l]
asianPop=popScoreComposerComplete[l,]
asianPopSum=colSums(asianPop[2:175])
qplot(seq_along(asianPopSum),asianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Chinese Composers")

The graph shows that as the economy of China rose, the proportion of Chinese composers’ works being performed did not increase significantly over time. I expect the reason to be not only are there not many Chinese composeres but also there are culture communication barriers between China and the United States. As the economy of China develops, there are more and more Chinese musicians as more money and effort is put into music and art education. However, most of them are performers rather than composers. Western music and western music education was introduced to China only after the beginning of the twentieth century, so the history of western music is still relatively short in China. In addition, during the culture revolution, China was again isolated from the rest of the world. Therefore, even though there are good Chinese composers, their works are not introduced to the US.

####French
I also did French and Italian composers performance hisotry graphs over time in order to compare them with MoMA exhibition history data.

french1=read.csv("frenchtest1.csv", header = FALSE ,encoding = "UTF-8")
french2=read.csv("frenchtest2.csv", header = FALSE ,encoding = "UTF-8")
french3=read.csv("frenchtest3.csv", header = FALSE ,encoding = "UTF-8")
french4=read.csv("frenchtest4.csv", header = FALSE ,encoding = "UTF-8")
french=c(french1,french2,french3,french4)
french=unique(unlist(french))

french1.0=gsub("\\(composer)","",french)
french1.0=gsub("\\(conductor)","",french1.0)
french1.0=gsub("\\(1907???1970)","",french1.0)
french1.0=gsub("\\(organist)","",french1.0)
french1.0=gsub("\\(violist)","",french1.0)
french1.0=gsub("\\(musician) ","",french1.0)
french1.0=gsub("\\(Chantilly Codex composer) ","",french1.0)
french1.0=gsub("\\(lutenist)  ","",french1.0)
french1.1=strsplit(as.character(french1.0)," ")

french1.2=list(rep(0,length(french1.1)))
for ( i in 1:length(french1.1)){
  if (length(french1.1[[i]])>1)
    french1.2[i]=paste(french1.1[[i]][length(french1.1[[i]])],paste(french1.1[[i]][1:length(french1.1[[i]])-1], collapse=" "),sep=", ")
}
french1.2=french1.2[!is.na(french1.2)]

test2=partialMatch(popScoreComposerComplete$composers,french1.2)
test3=test2[-c(95,98,90,82,83,86,87,88,90),]
french1.3=test3$raw.x
save(french1.3,file="frenchComps.RData")

load("frenchComps.RData")
l=c()
for ( i in 1:length(french1.3)){
  l=c(l,which(french1.3[i]==popScoreComposerComplete$composers))
}

french=popScoreComposerComplete$composers[l]
frenchPop=popScoreComposerComplete[l,]
frenchPopSum=colSums(frenchPop[2:175])
qplot(seq_along(frenchPopSum),frenchPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("French Composers")

frenchTop=rowSums(frenchPop[2:175],na.rm=TRUE)
frenchTop=cbind(as.data.frame(french)[1],frenchTop)
frenchTop1=frenchTop[order(-frenchTop$frenchTop),]
head(frenchTop1,20)

####Italian

itallian1=read.csv("italiantest1.csv", header = FALSE ,encoding = "UTF-8")
itallian2=read.csv("italiantest2.csv", header = FALSE ,encoding = "UTF-8")
itallian3=read.csv("italiantest3.csv", header = FALSE ,encoding = "UTF-8")
itallian4=read.csv("italiantest4.csv", header = FALSE ,encoding = "UTF-8")
itallian5=read.csv("italiantest5.csv", header = FALSE ,encoding = "UTF-8")
italian=c(itallian1,itallian2,itallian3,itallian4,itallian5)
italian=unique(unlist(italian))

italian1.0=gsub("\\(composer)","",italian)
italian1.0=gsub("\\(conductor)","",italian1.0)
italian1.0=gsub("\\(classical era composer)","",italian1.0)
italian1.0=gsub("\\ (senior)","",italian1.0)
italian1.1=strsplit(as.character(italian1.0)," ")

italian1.2=list(rep(0,length(italian1.1)))
for ( i in 1:length(italian1.1)){
  if (length(italian1.1[[i]])>1)
    italian1.2[i]=paste(italian1.1[[i]][length(italian1.1[[i]])],paste(italian1.1[[i]][1:length(italian1.1[[i]])-1], collapse=" "),sep=", ")
}
italian1.2=italian1.2[!is.na(italian1.2)]

test2=partialMatch(popScoreComposerComplete$composers,italian1.2)
test3=test2[-c(115,114,108,107),]
italian1.3=test3$raw.x
save(italian1.3,file="italianComps.RData")

load("italianComps.RData")

l=c()
for ( i in 1:length(italian1.3)){
  l=c(l,which(italian1.3[i]==popScoreComposerComplete$composers))
}

italian=popScoreComposerComplete$composers[l]
italianPop=popScoreComposerComplete[l,]
italianPopSum=colSums(italianPop[2:175])
qplot(seq_along(italianPopSum),italianPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Italian Composers")

italianTop=rowSums(italianPop[2:175],na.rm=TRUE)
italianTop=cbind(as.data.frame(italian)[1],italianTop)
italianTop1=italianTop[order(-italianTop$italianTop),]
head(italianTop1,20)

####The status of Women Composers
The feminist movements accelerated in 1960s. It first starts in political and economic equality between men and women, and spread to the culture sectors. Can we find this reflected in NY Phil performance data?

I cannot find a comprehensive list of woman composers in the world. I took American composer as a smaple and examined the proportion of American womem composers’ work being performed over time by NY Phil. To do this, I scraped this page (http://names.mongabay.com/female_names.htm) and got a list of common American female first names and matched them with the NY Phil record.

url <- 'http://names.mongabay.com/female_names.htm'
html <- read_html(url, encoding = "UTF-8")
tables <- html_table(html, fill=TRUE)
tables=tables[[1]]
femalename=tables[1]
femalename=femalename[1:500,]
femalenames=tolower(femalename)
save(femalenames,file="femalenames.RData")

load("femalenames.RData")
names=americansPop[1]$composers
splitName2=strsplit(names,",")
fname=c()
for (i in 1:length(splitName2)){
  fname=c(fname,splitName2[[i]][2])
}
fname=tolower(fname)
fname=trimws(fname)
fname3=strsplit(fname," ")
fname4=c()
for (i in 1: length(fname3)){
  fname4=c(fname4,fname3[[i]][1])
}


l=c()
for ( i in 1:length(femalenames)){
   l=c(l,which(femalenames[i]==fname4))
}

woman=americansPop[l,1]
woman

womanTrue=woman[-c(8,15,16,19,22)]
womanTrue

length(womanTrue)/nrow(americansPop)

womanPop=americansPop[l,]
womanPop=womanPop[-c(8,15,16,19,22),]
womanPopSum=colSums(womanPop[2:175])
qplot(seq_along(womanPopSum),womanPopSum)+geom_line()+ylim(0,1)+geom_area(colour="black")+scale_x_continuous(breaks=seq(1,175,10),labels=c("1842","1852","1862","1872","1882","1892","1902","1912","1922","1932","1942","1952","1962","1972","1982","1992","2002","2012"))+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+xlab("seasons")+ylab("percentage of works being performed")+ggtitle("Americna Women Composers")

There are also some gender neutral names in the female first name list. Thus, some of the people in the list could be male. I removed them by hand. The graph shows that as time changes, the proportion of women composers’ works being performed did not increase significantly over time, which reflects the sad situation of women in classical music.

Art from MoMA

To compare with how NY Phil performance history reflects the change of American society, I decide to do a series of MoMA exhibition history graphs by country.

require(dplyr)
MoMA=read.csv("MoMA.csv",header=TRUE,encoding = "UTF-8")
moma.1=MoMA[,c("Nationality","Date")][1:98578,]
moma.1$Date.1=as.numeric(gsub("([0-9]+).*$", "\\1", moma.1$Date))
moma.1=na.omit(moma.1)
moma.1=moma.1[,c("Nationality", "Date.1")]
write.csv(moma.1,"momaSmall.csv")

moma.1=read.csv("momaSmall.csv",row.names=1)
moma.1=subset(moma.1, Date.1>=1929)

test=unique(moma.1$Date.1)

test=sort(test)
tyear=rep(0,length(test))
soviet=rep(0,length(test))
american=rep(0,length(test))
germanAustria=rep(0,length(test))
french=rep(0,length(test))
italian=rep(0,length(test))
asianLatin=rep(0,length(test))

psoviet=rep(0,length(test))
pamerican=rep(0,length(test))
pgermanAustria=rep(0,length(test))
pfrench=rep(0,length(test))
pitalian=rep(0,length(test))
pasianLatin=rep(0,length(test))

for ( i in 1:length(test)){
  tyear[i]=unlist(nrow(subset(moma.1,Date.1==test[i])))
  american[i]=length(grep("American",subset(moma.1,Date.1==test[i])$Nationality))+length(grep("USA",subset(moma.1,Date.1==test[i])$Nationality))
  pamerican[i]=as.numeric(american[i])/as.numeric(tyear[i])
  
   soviet[i]=length(grep("Russian",subset(moma.1,Date.1==test[i])$Nationality))
  psoviet[i]=as.numeric(soviet[i])/as.numeric(tyear[i])
  
  germanAustria[i]=length(grep("German",subset(moma.1,Date.1==test[i])$Nationality))

  pgermanAustria[i]=as.numeric(germanAustria[i])/as.numeric(tyear[i])
  
  french[i]=length(grep("French",subset(moma.1,Date.1==test[i])$Nationality))
  pfrench[i]=as.numeric(french[i])/as.numeric(tyear[i])
  
  italian[i]=length(grep("Italian",subset(moma.1,Date.1==test[i])$Nationality))
  pitalian[i]=as.numeric(italian[i])/as.numeric(tyear[i])
  
   asianLatin[i]=length(grep("Chinese",subset(moma.1,Date.1==test[i])$Nationality))
  pasianLatin[i]=as.numeric(asianLatin[i])/as.numeric(tyear[i])
}

qplot(seq_along(pamerican),pamerican)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("American Artists")

qplot(seq_along(pgermanAustria),pgermanAustria)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("German Artists")

qplot(seq_along(psoviet),psoviet)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Russian Artists")

qplot(seq_along(pfrench),pfrench)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("French Artists")

qplot(seq_along(pitalian),pitalian)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Italian Artists")

sum(asianLatin)
sum(asianLatin)/nrow(moma.1)
qplot(seq_along(pasianLatin),pasianLatin)+geom_line()+ylim(0,1)+geom_area(colour="black")+ theme(axis.text.x = element_text(angle = 45,size=10, hjust = 1))+scale_x_continuous(breaks=seq(1,90,by=10),labels=c("1929","1939","1949","1959","1969","1979","1989","1999","2009"))+xlab("years")+ylab("percentage of works being exhibited")+ggtitle("Chinese Artists")

the graphs show that MoMA exhibition history is more sensitive to changes in US social pressures than the NY Philharmoic performance history. For example, during WWII, the exhibitions of German artists’ work at MoMA are very infrequent. But later, before and after Berlin Wall fell when Americans had lots of sympathy for Germans, there is a big peak in the frequency of German artists’ exhibitions. The fluctuation is smaller for the NY Phil composer-frequency data when compared with the peaks in MoMA’s exhibition frequency data.
This might be because art, as relected by curatorial and exhibition selections, is actually more sensitive to social pressures than are choices of music to perform. Alternatively, it might because MoMA’s exhibits are recent and contemporary while the NY Phil concerts include a much longer history of music and this long history somehoe dilutes the effects of social attitudes. For example, Americans did not hate German music from Beethoven’s era.

Conclusion

In this project, I studied the performance history of the NY Philharmonic and analyzed the trends of performance frequency by composer nationality and gender as a function of social attitudes derived from states of war, hostility and censorship. I also compared NY Phil performance data with MoMA exhibition data and found MoMA exhibition data to be even more sensitive to such social attitude pressures. This project tells the story of the NY Philharmonic’s performance history and tries to explain how changes in its repertoire are related to changes in social attitudes in American history. This is my first attempt to bring quantitative analysis to bear on a field in the humanities.

future work

I would like to graph some individual NY Phil performer or composer’s performance history to show how he or she rose to stardom over time. Is there a steady rise in the number of performances or are there any up and downs. In addition, I’d like to study the proportion of composers whose works are performed at NY Phil during their own lifetimes. Furthur, I’d like to see if any global art and culture trends like impressionism and popularity of Ballet Russe corresponds NY Phil performance history and MoMA exhibition history. In addition, I do want to point out that in this research I am relying on internet sources esepcially Wikipedia pages for composers’ personal information. I believe that crowd intelligence can be reliable, but because these are not authorized sources, there must be some mistakes in the content. I caught some of them and corrected them by hand, but there might be some other faults in the sources which I did not catch. If I have more time and the resouces, I’d do the same study trying from authenticated sources for composers’ nationalities and women gender and compare it with my study based on wikipeida pages, which can be a way to see how reliable crowd intelligence is.

Achnolwegement

I thank Yoav Bergner for introducing me to the wonderful world of data science. I thank Vincent Dorie for teaching me debugging techniques.

GBA_Memory-Access-Scanner

[ Description —————————]

This program automates the process of setting watchpoints to detect functions accessing a structure or block of memory.
It is capable of presenting all detected functions that write and read from a block of memory or structure.
It detects access types (ldr having a type of 32, strh having a type of 16, ldrb having a type of 8, etc)
and access offsets (str r0, [r5, 0x35] 0x35, being the offset)

Through detected access types and offsets, the program can generate a typedef structure template for the structure itself.
However, correctly estimating the size of a structure is very critical for the generation of the template.
Underestimating is OK, but overestimating is bad.

Sometimes, the game may access a memory location inconsistently. This causes problems in the generation
of a structure template, which generates false structure padding. In such a case, all relevent entries are marked as
CONFLICT in the structure template output. By fixing these conflicts manually (by choosing only one
and removing the other duplicates), the template may be input into the StructPadder module to fix the padding.

[ Protocol ——————————]

Setting up and running the MemoryAccessDetector.lua in VBA-rr and doing relevent actions to the structure in game
should generate output that looks like this:

name=s_02001B80, size=0x84
080050EC::080050F4 u8(0x00), 08035932::08035938 u8(0x06), 0809F99A::0809F9A0 u8(0x10), 
0809DEA0::0809DEC0 u8(0x04), 08034EF0::08034EFC u8(0x0E), 08034F68::08034F74 u32(0x18),

The first line contains meta information important to the MemoryAccessProtocol module.
The next lines contain a repeating pattern of entries that describe a memory access.
The format is: <function_Address>::< Memory_Access_Address> u<type_of_access>(<Offset_of_access>)
The program attempts to find the function address by searching for a push {…, lr} somewhere above.
If it detects a pop {…, pc} first, it indicates that the function address is unkown by placing a ‘?’ in its location.

[ Usage ———————————-]

Configure the MemoryAccessDetector.lua file by
1a. setting the base address and the size, and name of the structure.
1b. setting whether to scan on reads (LDRs) or writes (STRs) or both (or neither, oh well).
Run the script in VBA-rr while playing the relevent game you’re trying to scan.
2a. Perform actions you think are relevent to the structure to get a better output.
2b. (By default) Press ‘P’ after you’re done to make sure all memory access entries have been outputted.
Copy the output of the lua script into the file “input”.
Run the MemoryAccesProtocol.py module to generate a structure template in stdout.

In case the structure template containts CONFLICTS:

Manually go through each conflict, and remove duplicates
(structure members of the same location yet different types).
(optional): Remove the tag ” CONFLICT” from the entry. so that the only comment is “// loc=0x22” for example.
Copy the content of the template and put it in the “input” file.
(minus the “typdef struct{” lines and “}structName;” lines)
Run the StructPadder.py module to get correct padding.

[ Dependencies ——————————]

VBA-rr
Python3
A GBA ROM to scan

Author: zlcosa2p5x5i

Robofinder

Features

Installation

Using pipx

Manual Installation

Usage

Basic Command

Options and Examples

Advanced Usage

Example Output

Contributing

txtnish

Unmaintained

Synopsis

Description

Installation

Subcommands

tweet

timeline

publish

follow

unfollow

following

reply

Search tweets

Configuration

General

add_metadata

awk

sed

limit

formatter

sort_order

timeout

use_color

pager

disclose_identity

nick

twturl

always_update

http_proxy

https_proxy

sign_twtfile

check_signature

sign_user

gpg_bin

ipfs_gateway

Publish with scp

scp_user

scp_host

scp_remote_name

sftp_over_scp

Publish with ftp

ftp_user

ftp_host

ftp_remote_name

Publish with IPFS

ipfs_publish

ipfs_wrap_with_dir

ipfs_recursive

Colors

Hooks

pre_tweet_hook

post_tweet_hook

filter_tweets_hook

See also

License

remark-admonitions

Installation

unified + remark

Docusaurus v2

Usage

Options

Defaults

Custom Types

Styling

Infima (Docusaurus v2)

Classic (Docusaurus v1)

Credit

Using `pipx`