ipynbdiff-0.4.7/0000755000004100000410000000000014246427767013547 5ustar www-datawww-dataipynbdiff-0.4.7/Gemfile.lock0000644000004100000410000000177614246427767016004 0ustar www-datawww-dataGEM remote: https://rubygems.org/ specs: ast (2.4.2) binding_ninja (0.2.3) coderay (1.1.3) diff-lcs (1.4.4) diffy (3.3.0) json (2.5.1) parser (3.0.2.0) ast (~> 2.4.1) proc_to_ast (0.1.0) coderay parser unparser rspec (3.10.0) rspec-core (~> 3.10.0) rspec-expectations (~> 3.10.0) rspec-mocks (~> 3.10.0) rspec-core (3.10.1) rspec-support (~> 3.10.0) rspec-expectations (3.10.1) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.10.0) rspec-mocks (3.10.2) diff-lcs (>= 1.2.0, < 2.0) rspec-support (~> 3.10.0) rspec-parameterized (0.5.0) binding_ninja (>= 0.2.3) parser proc_to_ast rspec (>= 2.13, < 4) unparser rspec-support (3.10.2) unparser (0.6.0) diff-lcs (~> 1.3) parser (>= 3.0.0) PLATFORMS ruby x86_64-darwin-20 DEPENDENCIES diffy (= 3.3.0) json (= 2.5.1) rspec (= 3.10.0) rspec-parameterized (= 0.5.0) BUNDLED WITH 2.2.30 ipynbdiff-0.4.7/README.md0000644000004100000410000000406314246427767015031 0ustar www-datawww-data# IpynbDiff: Better diff for Jupyter Notebooks This is a simple diff tool that cleans up Jupyter notebooks, transforming each [notebook](example/1/from.ipynb) into a [readable markdown file](example/1/from_html.md), keeping the output of cells, and running the diff after. Markdowns are generated using an opinionated Jupyter to Markdown conversion. This means that the entire file is readable on the diff. The result are diffs that are much easier to read: | Diff | IpynbDiff | | ----------------------------------- | ----------------------------------------------------- | | [Diff text](example/diff.txt) | [IpynbDiff text](example/ipynbdiff_percent.txt) | | ![Diff image](example/img/diff.png) | ![IpynbDiff image](example/img/ipynbdiff_percent.png) | This started as a port of [ipynbdiff](https://gitlab.com/gitlab-org/incubation-engineering/mlops/poc/ipynbdiff), but now has extended functionality although not working as git driver. ## Usage ### Generating diffs ```ruby IpynbDiff.diff(from_path, to_path, options) ``` Options: ```ruby @default_transform_options = { preprocess_input: true, # Whether the input should be transformed write_output_to: nil, # Pass a path to save the output to a file format: :text, # These are the formats Diffy accepts https://github.com/samg/diffy sources_are_files: false, # Weather to use the from/to as string or path to a file raise_if_invalid_notebook: false, # Raises an error if the notebooks are invalid, otherwise returns nil transform_options: @default_transform_options, # See below for transform options diff_opts: { include_diff_info: false # These are passed to Diffy https://github.com/samg/diffy } } ``` ### Transforming the notebooks It might be necessary to have the transformed files in addition to the diff. ```ruby IpynbDiff.transform(notebook, options) ``` Options: ```ruby @default_transform_options = { include_frontmatter: false, # Whether to include or not the notebook metadata (kernel, language, etc) } ``` ipynbdiff-0.4.7/ipynbdiff.gemspec0000644000004100000410000000252214246427767017067 0ustar www-datawww-data# frozen_string_literal: true lib = File.expand_path('lib/..', __dir__) $LOAD_PATH.unshift lib unless $LOAD_PATH.include?(lib) require 'lib/version' Gem::Specification.new do |s| s.name = 'ipynbdiff' s.version = IpynbDiff::VERSION s.summary = 'Human Readable diffs for Jupyter Notebooks' s.description = 'Better diff for Jupyter Notebooks by first preprocessing them and removing clutter' s.authors = ['Eduardo Bonet'] s.email = 'ebonet@gitlab.com' # Specify which files should be added to the gem when it is released. # The `git ls-files -z` loads the files in the RubyGem that have been added into git. s.files = Dir.chdir(File.expand_path(__dir__)) do `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(spec|example)/}) } end s.homepage = 'https://gitlab.com/gitlab-org/incubation-engineering/mlops/rb-ipynbdiff' s.license = 'MIT' s.require_paths = ['lib'] s.add_runtime_dependency 'diffy', '~> 3.3' s.add_runtime_dependency 'json', '~> 2.5', '>= 2.5.1' s.add_development_dependency 'bundler', '~> 2.2' s.add_development_dependency 'guard-rspec' s.add_development_dependency 'pry' s.add_development_dependency 'rake' s.add_development_dependency 'rspec' s.add_development_dependency 'rspec-parametized' s.metadata = { 'rubygems_mfa_required' => 'true' } end ipynbdiff-0.4.7/.rubocop.yml0000644000004100000410000000004014246427767016013 0ustar www-datawww-datainherit_from: .rubocop_todo.yml ipynbdiff-0.4.7/.VERSION.TMPL0000644000004100000410000000011214246427767015442 0ustar www-datawww-data# lib/emoticon/version.rb module IpynbDiff VERSION = "GEM_VERSION" end ipynbdiff-0.4.7/.gitignore0000644000004100000410000000004014246427767015531 0ustar www-datawww-data.tool-versions .bundle *.gem ipynbdiff-0.4.7/.rubocop_todo.yml0000644000004100000410000000163614246427767017054 0ustar www-datawww-data# This configuration was generated by # `rubocop --auto-gen-config` # on 2021-12-22 14:13:29 UTC using RuboCop version 1.23.0. # The point is for the user to remove these configuration records # one by one as the offenses are removed from the code base. # Note that changes in the inspected code, or installation of new # versions of RuboCop, may require this file to be generated again. # Offense count: 1 # Configuration parameters: Include. # Include: **/*.gemspec Gemspec/RequiredRubyVersion: Exclude: - 'ipynbdiff.gemspec' AllCops: NewCops: enable Style/StringConcatenation: Enabled: false # Offense count: 6 # Configuration parameters: CountComments, CountAsOne, ExcludedMethods, IgnoredMethods. # IgnoredMethods: refine Metrics/BlockLength: Enabled: false # Offense count: 3 # Configuration parameters: CountComments, CountAsOne, ExcludedMethods, IgnoredMethods. Metrics/MethodLength: Enabled: false ipynbdiff-0.4.7/lib/0000755000004100000410000000000014246427767014315 5ustar www-datawww-dataipynbdiff-0.4.7/lib/version.rb0000644000004100000410000000010414246427767016322 0ustar www-datawww-data# lib/emoticon/version.rb module IpynbDiff VERSION = "0.4.7" end ipynbdiff-0.4.7/lib/symbolized_markdown_helper.rb0000644000004100000410000000124614246427767022267 0ustar www-datawww-data# frozen_string_literal: true module IpynbDiff # Helper functions module SymbolizedMarkdownHelper def _(symbol = nil, content = '') { symbol: symbol, content: content } end def array_if_not_array(thing) thing.is_a?(Array) ? thing : [thing] end def symbolize_array(symbol, content, &block) if content.is_a?(Array) content.map.with_index { |l, idx| _(symbol / idx, block.call(l)) } else _(symbol, content) end end end # Simple wrapper for a string class JsonSymbol < String def /(other) JsonSymbol.new((other.is_a?(Array) ? [self, *other] : [self, other]).join('.')) end end end ipynbdiff-0.4.7/lib/diff.rb0000644000004100000410000000057714246427767015563 0ustar www-datawww-data# frozen_string_literal: true # Custom differ for Jupyter Notebooks module IpynbDiff require 'delegate' # The result of a diff object class Diff < SimpleDelegator require 'diffy' attr_reader :from, :to def initialize(from, to, diffy_opts) super(Diffy::Diff.new(from.as_text, to.as_text, **diffy_opts)) @from = from @to = to end end end ipynbdiff-0.4.7/lib/ipynbdiff.rb0000644000004100000410000000144214246427767016615 0ustar www-datawww-data# frozen_string_literal: true # Human Readable Jupyter Diffs module IpynbDiff require 'transformer' require 'diff' def self.diff(from, to, raise_if_invalid_nb: false, include_frontmatter: false, hide_images: false, diffy_opts: {}) transformer = Transformer.new(include_frontmatter: include_frontmatter, hide_images: hide_images) Diff.new(transformer.transform(from), transformer.transform(to), diffy_opts) rescue InvalidNotebookError raise if raise_if_invalid_nb end def self.transform(notebook, raise_errors: false, include_frontmatter: true, hide_images: false) return unless notebook Transformer.new(include_frontmatter: include_frontmatter, hide_images: hide_images).transform(notebook).as_text rescue InvalidNotebookError raise if raise_errors end end ipynbdiff-0.4.7/lib/ipynb_symbol_map.rb0000644000004100000410000001007114246427767020204 0ustar www-datawww-data# frozen_string_literal: true module IpynbDiff class InvalidTokenError < StandardError end # Creates a symbol map for a ipynb file (JSON format) class IpynbSymbolMap class << self def parse(notebook, objects_to_ignore = []) IpynbSymbolMap.new(notebook, objects_to_ignore).parse('') end end attr_reader :current_line, :char_idx, :results WHITESPACE_CHARS = ["\t", "\r", ' ', "\n"].freeze VALUE_STOPPERS = [',', '[', ']', '{', '}', *WHITESPACE_CHARS].freeze def initialize(notebook, objects_to_ignore = []) @chars = notebook.chars @current_line = 0 @char_idx = 0 @results = {} @objects_to_ignore = objects_to_ignore end def parse(prefix = '.') raise_if_file_ended skip_whitespaces if (c = current_char) == '"' parse_string elsif c == '[' parse_array(prefix) elsif c == '{' parse_object(prefix) else parse_value end results end def parse_array(prefix) # [1, 2, {"some": "object"}, [1]] i = 0 current_should_be '[' loop do raise_if_file_ended break if skip_beginning(']') new_prefix = "#{prefix}.#{i}" add_result(new_prefix, current_line) parse(new_prefix) i += 1 end end def parse_object(prefix) # {"name":"value", "another_name": [1, 2, 3]} current_should_be '{' loop do raise_if_file_ended break if skip_beginning('}') prop_name = parse_string(return_value: true) next_and_skip_whitespaces current_should_be ':' next_and_skip_whitespaces if @objects_to_ignore.include? prop_name skip else new_prefix = "#{prefix}.#{prop_name}" add_result(new_prefix, current_line) parse(new_prefix) end end end def parse_string(return_value: false) current_should_be '"' init_idx = @char_idx loop do increment_char_index raise_if_file_ended if current_char == '"' && !prev_backslash? init_idx += 1 break end end @chars[init_idx...@char_idx].join if return_value end def add_result(key, line_number) @results[key] = line_number end def parse_value increment_char_index until raise_if_file_ended || VALUE_STOPPERS.include?(current_char) end def skip_whitespaces while WHITESPACE_CHARS.include?(current_char) raise_if_file_ended check_for_new_line increment_char_index end end def increment_char_index @char_idx += 1 end def next_and_skip_whitespaces increment_char_index skip_whitespaces end def current_char raise_if_file_ended @chars[@char_idx] end def prev_backslash? @chars[@char_idx - 1] == '\\' && @chars[@char_idx - 2] != '\\' end def current_should_be(another_char) raise InvalidTokenError unless current_char == another_char end def check_for_new_line @current_line += 1 if current_char == "\n" end def raise_if_file_ended @char_idx >= @chars.size && raise(InvalidTokenError) end def skip raise_if_file_ended skip_whitespaces if (c = current_char) == '"' parse_string elsif c == '[' skip_array elsif c == '{' skip_object else parse_value end end def skip_array loop do raise_if_file_ended break if skip_beginning(']') skip end end def skip_object loop do raise_if_file_ended break if skip_beginning('}') parse_string next_and_skip_whitespaces current_should_be ':' next_and_skip_whitespaces skip end end def skip_beginning(closing_char) check_for_new_line next_and_skip_whitespaces return true if current_char == closing_char next_and_skip_whitespaces if current_char == ',' end end end ipynbdiff-0.4.7/lib/transformed_notebook.rb0000644000004100000410000000076214246427767021073 0ustar www-datawww-data# frozen_string_literal: true module IpynbDiff # Notebook that was transformed into md, including location of source cells class TransformedNotebook attr_reader :blocks def as_text @blocks.map { |b| b[:content] }.join("\n") end private def initialize(lines = [], symbol_map = {}) @blocks = lines.map do |line| { content: line[:content], source_symbol: (symbol = line[:symbol]), source_line: symbol && symbol_map[symbol] } end end end end ipynbdiff-0.4.7/lib/transformer.rb0000644000004100000410000000604414246427767017210 0ustar www-datawww-data# frozen_string_literal: true module IpynbDiff class InvalidNotebookError < StandardError end # Returns a markdown version of the Jupyter Notebook class Transformer require 'json' require 'yaml' require 'output_transformer' require 'symbolized_markdown_helper' require 'ipynb_symbol_map' require 'transformed_notebook' include SymbolizedMarkdownHelper @include_frontmatter = true @objects_to_ignore = ['application/javascript', 'application/vnd.holoviews_load.v0+json'] def initialize(include_frontmatter: true, hide_images: false) @include_frontmatter = include_frontmatter @hide_images = hide_images @output_transformer = OutputTransformer.new(hide_images: hide_images) end def validate_notebook(notebook) notebook_json = JSON.parse(notebook) return notebook_json if notebook_json.key?('cells') raise InvalidNotebookError rescue JSON::ParserError raise InvalidNotebookError end def transform(notebook) return TransformedNotebook.new unless notebook notebook_json = validate_notebook(notebook) transformed = transform_document(notebook_json) symbol_map = IpynbSymbolMap.parse(notebook) TransformedNotebook.new(transformed, symbol_map) end def transform_document(notebook) symbol = JsonSymbol.new('.cells') transformed_blocks = notebook['cells'].map.with_index do |cell, idx| decorate_cell(transform_cell(cell, notebook, symbol / idx), cell, symbol / idx) end transformed_blocks.prepend(transform_metadata(notebook)) if @include_frontmatter transformed_blocks.flatten end def decorate_cell(rows, cell, symbol) tags = cell['metadata']&.fetch('tags', []) type = cell['cell_type'] || 'raw' [ _(symbol, %(%% Cell type:#{type} id:#{cell['id']} tags:#{tags&.join(',')})), _, rows, _ ] end def transform_cell(cell, notebook, symbol) cell['cell_type'] == 'code' ? transform_code_cell(cell, notebook, symbol) : transform_text_cell(cell, symbol) end def transform_code_cell(cell, notebook, symbol) [ _(symbol / 'source', %(``` #{notebook.dig('metadata', 'kernelspec', 'language') || ''})), symbolize_array(symbol / 'source', cell['source'], &:rstrip), _(nil, '```'), cell['outputs'].map.with_index do |output, idx| @output_transformer.transform(output, symbol / ['outputs', idx]) end ] end def transform_text_cell(cell, symbol) symbolize_array(symbol / 'source', cell['source'], &:rstrip) end def transform_metadata(notebook_json) as_yaml = { 'jupyter' => { 'kernelspec' => notebook_json['metadata']['kernelspec'], 'language_info' => notebook_json['metadata']['language_info'], 'nbformat' => notebook_json['nbformat'], 'nbformat_minor' => notebook_json['nbformat_minor'] } }.to_yaml as_yaml.split("\n").map { |l| _(nil, l) }.append(_(nil, '---'), _) end end end ipynbdiff-0.4.7/lib/output_transformer.rb0000644000004100000410000000531014246427767020623 0ustar www-datawww-data# frozen_string_literal: true module IpynbDiff # Transforms Jupyter output data into markdown class OutputTransformer require 'symbolized_markdown_helper' include SymbolizedMarkdownHelper HIDDEN_IMAGE_OUTPUT = ' [Hidden Image Output]' ORDERED_KEYS = { 'execute_result' => %w[image/png image/svg+xml image/jpeg text/markdown text/latex text/plain], 'display_data' => %w[image/png image/svg+xml image/jpeg text/markdown text/latex], 'stream' => %w[text] }.freeze def initialize(hide_images: false) @hide_images = hide_images end def transform(output, symbol) transformed = case (output_type = output['output_type']) when 'error' transform_error(output['traceback'], symbol / 'traceback') when 'execute_result', 'display_data' transform_non_error(ORDERED_KEYS[output_type], output['data'], symbol / 'data') when 'stream' transform_element('text', output['text'], symbol) end transformed ? decorate_output(transformed, output, symbol) : [] end def decorate_output(output_rows, output, symbol) [ _, _(symbol, %(%%%% Output: #{output['output_type']})), _, *output_rows ] end def transform_error(traceback, symbol) traceback.map.with_index do |t, idx| t.split("\n").map do |l| _(symbol / idx, l.gsub(/\[[0-9][0-9;]*m/, '').sub("\u001B", ' ').gsub(/\u001B/, '').rstrip) end end end def transform_non_error(accepted_keys, elements, symbol) accepted_keys.filter { |key| elements.key?(key) }.map do |key| transform_element(key, elements[key], symbol) end end def transform_element(output_type, output_element, symbol_prefix) new_symbol = symbol_prefix / output_type case output_type when 'image/png', 'image/jpeg' transform_image(output_type + ';base64', output_element, new_symbol) when 'image/svg+xml' transform_image(output_type + ';utf8', output_element, new_symbol) when 'text/markdown', 'text/latex', 'text/plain', 'text' transform_text(output_element, new_symbol) end end def transform_image(image_type, image_content, symbol) return _(nil, HIDDEN_IMAGE_OUTPUT) if @hide_images lines = image_content.is_a?(Array) ? image_content : [image_content] single_line = lines.map(&:strip).join.gsub(/\s+/, ' ') _(symbol, " ![](data:#{image_type},#{single_line})") end def transform_text(text_content, symbol) symbolize_array(symbol, text_content) { |l| " #{l.rstrip}" } end end end ipynbdiff-0.4.7/Gemfile0000644000004100000410000000024014246427767015036 0ustar www-datawww-data# frozen_string_literal: true source 'https://rubygems.org' gem 'diffy', '3.3.0' gem 'json', '2.5.1' gem 'rspec', '3.10.0' gem 'rspec-parameterized', '0.5.0' ipynbdiff-0.4.7/.gitlab-ci.yml0000644000004100000410000000242214246427767016203 0ustar www-datawww-data# You can override the included template(s) by including variable overrides # SAST customization: https://docs.gitlab.com/ee/user/application_security/sast/#customizing-the-sast-settings # Secret Detection customization: https://docs.gitlab.com/ee/user/application_security/secret_detection/#customizing-settings # Dependency Scanning customization: https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#customizing-the-dependency-scanning-settings # Note that environment variables can be set in several places # See https://docs.gitlab.com/ee/ci/variables/#cicd-variable-precedence image: ruby:2.7 stages: - test - build - rubygems specs: stage: test script: - bundle install - bundle exec rspec build-gem: stage: build script: - bundle install - cat .VERSION.TMPL | sed s/GEM_VERSION/0.0.0/ > lib/version.rb - gem build ipynbdiff.gemspec artifacts: paths: - ipynbdiff-0.0.0.gem needs: - specs deploy-gem: stage: rubygems script: - bundle install - cat .VERSION.TMPL | sed s/GEM_VERSION/$CI_COMMIT_TAG/ > lib/version.rb - gem build ipynbdiff.gemspec - gem push ipynbdiff-$CI_COMMIT_TAG.gem only: - tags except: - branches needs: - build-gem when: manual include: - template: Security/Dependency-Scanning.gitlab-ci.yml