pax_global_header00006660000000000000000000000064134574560330014524gustar00rootroot0000000000000052 comment=0e0a80bad9c270d07f4a306f9dd15ec57d28fe70 rinku-2.0.6/000077500000000000000000000000001345745603300126615ustar00rootroot00000000000000rinku-2.0.6/.gitignore000066400000000000000000000000661345745603300146530ustar00rootroot00000000000000.ruby-version lib/rinku.bundle tmp/ Gemfile.lock /pkg rinku-2.0.6/.gitmodules000066400000000000000000000001161345745603300150340ustar00rootroot00000000000000[submodule "sundown"] path = sundown url = git://github.com/vmg/sundown.git rinku-2.0.6/.travis.yml000066400000000000000000000001301345745603300147640ustar00rootroot00000000000000language: ruby cache: bundler sudo: false rvm: - 2.0.0 - 2.1.6 - 2.2.4 - 2.3.1 rinku-2.0.6/COPYING000066400000000000000000000013451345745603300137170ustar00rootroot00000000000000ISC License Copyright (c) 2011, Vicent Marti Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. rinku-2.0.6/Gemfile000066400000000000000000000000471345745603300141550ustar00rootroot00000000000000source 'https://rubygems.org' gemspec rinku-2.0.6/README.markdown000066400000000000000000000116311345745603300153640ustar00rootroot00000000000000Rinku does linking ================== [![Build Status](https://travis-ci.org/vmg/rinku.svg?branch=master)](https://travis-ci.org/vmg/rinku) [![Dependency Status](https://www.versioneye.com/ruby/rinku/badge.svg)](https://www.versioneye.com/ruby/rinku) Rinku is a Ruby library that does autolinking. It parses text and turns anything that remotely resembles a link into an HTML link, just like the Ruby on Rails `auto_link` method -- but it's about 20 times faster, because it's written in C, and it's about 20 times smarter when linking, because it does actual parsing instead of RegEx replacements. Rinku is a Ruby Gem ------------------- Rinku is available as a Ruby gem: $ [sudo] gem install rinku The Rinku source is available at GitHub: $ git clone git://github.com/vmg/rinku.git Rinku is a standalone library ----------------------------- It exports a single method called `Rinku.auto_link`. ~~~~~ruby require 'rinku' Rinku.auto_link(text, mode=:all, link_attr=nil, skip_tags=nil) Rinku.auto_link(text, mode=:all, link_attr=nil, skip_tags=nil) { |link_text| ... } ~~~~~~ Parses a block of text looking for "safe" urls or email addresses, and turns them into HTML links with the given attributes. NOTE: The block of text may or may not be HTML; if the text is HTML, Rinku will skip the relevant tags to prevent double-linking and linking inside `pre` blocks by default. NOTE: If the input text is HTML, it's expected to be already escaped. Rinku will perform no escaping. NOTE: Currently the follow protocols are considered safe and are the only ones that will be autolinked. http:// https:// ftp:// mailto:// Email addresses are also autolinked by default. URLs without a protocol specifier but starting with 'www.' will also be autolinked, defaulting to the 'http://' protocol. - `text` is a string in plain text or HTML markup. If the string is formatted in HTML, Rinku is smart enough to skip the links that are already enclosed in `` tags.` - `mode` is a symbol, either `:all`, `:urls` or `:email_addresses`, which specifies which kind of links will be auto-linked. - `link_attr` is a string containing the link attributes for each link that will be generated. These attributes are not sanitized and will be include as-is in each generated link, e.g. ~~~~~ruby auto_link('http://www.pokemon.com', :all, 'target="_blank"') # => 'http://www.pokemon.com' ~~~~~ This string can be autogenerated from a hash using the Rails `tag_options` helper. - `skip_tags` is a list of strings with the names of HTML tags that will be skipped when autolinking. If `nil`, this defaults to the value of the global `Rinku.skip_tags`, which is initially `["a", "pre", "code", "kbd", "script"]`. - `&block` is an optional block argument. If a block is passed, it will be yielded for each found link in the text, and its return value will be used instead of the name of the link. E.g. ~~~~~ruby auto_link('Check it out at http://www.pokemon.com') do |url| "THE POKEMAN WEBSITEZ" end # => 'Check it out at THE POKEMAN WEBSITEZ' ~~~~~~ Rinku is a drop-in replacement for Rails 3.1 `auto_link` ---------------------------------------------------- Auto-linking functionality has been removed from Rails 3.1, and is instead offered as a standalone gem, `rails_autolink`. You can choose to use Rinku instead. ~~~~ruby require 'rails_rinku' include ActionView::Helpers::TextHelper post_body = "Welcome to my new blog at http://www.myblog.com/." auto_link(post_body, :html => { :target => '_blank' }) do |text| truncate(text, :length => 15) end # => "Welcome to my new blog at http://www.m...." ~~~~ The `rails_rinku` package monkeypatches Rails with an `auto_link` method that mimics 100% the original one, parameter per parameter. It's just faster. Developing ---------- ``` $ gem install rake-compiler $ rake ``` Rinku is written by me ---------------------- I am Vicent Marti, and I wrote Rinku. While Rinku is busy doing autolinks, you should be busy following me on twitter. [`@vmg`](http://twitter.com/vmg). Do it. Rinku has an awesome license ---------------------------- Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. rinku-2.0.6/Rakefile000066400000000000000000000004221345745603300143240ustar00rootroot00000000000000require 'bundler/setup' require 'bundler/gem_tasks' require 'rake/extensiontask' require 'rake/testtask' task default: :test Rake::ExtensionTask.new('rinku') # defines compile task Rake::TestTask.new(test: :compile) do |t| t.test_files = FileList['test/*_test.rb'] end rinku-2.0.6/ext/000077500000000000000000000000001345745603300134615ustar00rootroot00000000000000rinku-2.0.6/ext/rinku/000077500000000000000000000000001345745603300146115ustar00rootroot00000000000000rinku-2.0.6/ext/rinku/autolink.c000066400000000000000000000154371345745603300166150ustar00rootroot00000000000000/* * Copyright (c) 2016, GitHub, Inc * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include #include #include #include #include #include "buffer.h" #include "autolink.h" #include "utf8.h" #if defined(_WIN32) #define strncasecmp _strnicmp #endif static int is_valid_hostchar(const uint8_t *link, size_t link_len) { size_t pos = 0; int32_t ch = utf8proc_next(link, &pos); return !utf8proc_is_space(ch) && !utf8proc_is_punctuation(ch); } bool autolink_issafe(const uint8_t *link, size_t link_len) { static const size_t valid_uris_count = 5; static const char *valid_uris[] = { "/", "http://", "https://", "ftp://", "mailto:" }; size_t i; for (i = 0; i < valid_uris_count; ++i) { size_t len = strlen(valid_uris[i]); if (link_len > len && strncasecmp((char *)link, valid_uris[i], len) == 0 && rinku_isalnum(link[len])) return true; } return false; } static bool autolink_delim(const uint8_t *data, struct autolink_pos *link) { int32_t cclose, copen = 0; size_t i; for (i = link->start; i < link->end; ++i) if (data[i] == '<') { link->end = i; break; } while (link->end > link->start) { if (strchr("?!.,:", data[link->end - 1]) != NULL) link->end--; else if (data[link->end - 1] == ';') { size_t new_end = link->end - 2; while (new_end > 0 && rinku_isalnum(data[new_end])) new_end--; if (new_end < link->end - 2) { if (new_end > 0 && data[new_end] == '#') new_end--; if (data[new_end] == '&') { link->end = new_end; continue; } } link->end--; } else break; } if (link->end == link->start) return false; cclose = utf8proc_rewind(data, link->end); copen = utf8proc_open_paren_character(cclose); if (copen != 0) { /* Try to close the final punctuation sign in this link; if * there's more closing than opening punctuation symbols in the * URL, we conservatively remove one closing punctuation from * the end of the URL. * * Examples: * * foo http://www.pokemon.com/Pikachu_(Electric) bar * => http://www.pokemon.com/Pikachu_(Electric) * * foo (http://www.pokemon.com/Pikachu_(Electric)) bar * => http://www.pokemon.com/Pikachu_(Electric) * * foo http://www.pokemon.com/Pikachu_(Electric)) bar * => http://www.pokemon.com/Pikachu_(Electric) * * (foo http://www.pokemon.com/Pikachu_(Electric)) bar * => http://www.pokemon.com/Pikachu_(Electric) */ size_t closing = 0; size_t opening = 0; size_t i = link->start; while (i < link->end) { int32_t c = utf8proc_next(data, &i); if (c == copen) opening++; else if (c == cclose) closing++; } if (copen == cclose) { if (opening > 0) utf8proc_back(data, &link->end); } else { if (closing > opening) utf8proc_back(data, &link->end); } } return true; } static bool autolink_delim_iter(const uint8_t *data, struct autolink_pos *link) { size_t prev_link_end; int iterations = 0; while(link->end != 0) { prev_link_end = link->end; if (!autolink_delim(data, link)) return false; if (prev_link_end == link->end || iterations > 5) { break; } iterations++; } return true; } static bool check_domain(const uint8_t *data, size_t size, struct autolink_pos *link, bool allow_short) { size_t i, np = 0, uscore1 = 0, uscore2 = 0; if (!rinku_isalnum(data[link->start])) return false; for (i = link->start + 1; i < size - 1; ++i) { if (data[i] == '_') { uscore2++; } else if (data[i] == '.') { uscore1 = uscore2; uscore2 = 0; np++; } else if (!is_valid_hostchar(data + i, size - i) && data[i] != '-') break; } if (uscore1 > 0 || uscore2 > 0) return false; link->end = i; if (allow_short) { /* We don't need a valid domain in the strict sense (with * least one dot; so just make sure it's composed of valid * domain characters and return the length of the the valid * sequence. */ return true; } else { /* a valid domain needs to have at least a dot. * that's as far as we get */ return (np > 0); } } bool autolink__www( struct autolink_pos *link, const uint8_t *data, size_t pos, size_t size, unsigned int flags) { int32_t boundary; assert(data[pos] == 'w' || data[pos] == 'W'); if ((size - pos) < 4 || (data[pos + 1] != 'w' && data[pos + 1] != 'W') || (data[pos + 2] != 'w' && data[pos + 2] != 'W') || data[pos + 3] != '.') return false; boundary = utf8proc_rewind(data, pos); if (boundary && !utf8proc_is_space(boundary) && !utf8proc_is_punctuation(boundary)) return false; link->start = pos; link->end = 0; if (!check_domain(data, size, link, false)) return false; link->end = utf8proc_find_space(data, link->end, size); return autolink_delim_iter(data, link); } bool autolink__email( struct autolink_pos *link, const uint8_t *data, size_t pos, size_t size, unsigned int flags) { int nb = 0, np = 0; assert(data[pos] == '@'); link->start = pos; link->end = pos; for (; link->start > 0; link->start--) { uint8_t c = data[link->start - 1]; if (rinku_isalnum(c)) continue; if (strchr(".+-_%", c) != NULL) continue; break; } if (link->start == pos) return false; for (; link->end < size; link->end++) { uint8_t c = data[link->end]; if (rinku_isalnum(c)) continue; if (c == '@') nb++; else if (c == '.' && link->end < size - 1) np++; else if (c != '-' && c != '_') break; } if ((link->end - pos) < 2 || nb != 1 || np == 0 || (np == 1 && data[link->end - 1] == '.')) return false; return autolink_delim(data, link); } bool autolink__url( struct autolink_pos *link, const uint8_t *data, size_t pos, size_t size, unsigned int flags) { assert(data[pos] == ':'); if ((size - pos) < 4 || data[pos + 1] != '/' || data[pos + 2] != '/') return false; link->start = pos + 3; link->end = 0; if (!check_domain(data, size, link, flags & AUTOLINK_SHORT_DOMAINS)) return false; link->start = pos; link->end = utf8proc_find_space(data, link->end, size); while (link->start && rinku_isalpha(data[link->start - 1])) link->start--; if (!autolink_issafe(data + link->start, size - link->start)) return false; return autolink_delim_iter(data, link); } rinku-2.0.6/ext/rinku/autolink.h000066400000000000000000000027261345745603300166170ustar00rootroot00000000000000/* * Copyright (c) 2016, GitHub, Inc * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #ifndef RINKU_AUTOLINK_H #define RINKU_AUTOLINK_H #include #include #include "buffer.h" #ifdef __cplusplus extern "C" { #endif enum { AUTOLINK_SHORT_DOMAINS = (1 << 0), }; struct autolink_pos { size_t start; size_t end; }; bool autolink_issafe(const uint8_t *link, size_t link_len); bool autolink__www(struct autolink_pos *res, const uint8_t *data, size_t pos, size_t size, unsigned int flags); bool autolink__email(struct autolink_pos *res, const uint8_t *data, size_t pos, size_t size, unsigned int flags); bool autolink__url(struct autolink_pos *res, const uint8_t *data, size_t pos, size_t size, unsigned int flags); #ifdef __cplusplus } #endif #endif /* vim: set filetype=c: */ rinku-2.0.6/ext/rinku/buffer.c000066400000000000000000000105461345745603300162340ustar00rootroot00000000000000/* * Copyright (c) 2008, Natacha Porté * Copyright (c) 2011, Vicent Martí * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #define BUFFER_MAX_ALLOC_SIZE (1024 * 1024 * 16) //16mb #include "buffer.h" #include #include #include #include /* MSVC compat */ #if defined(_MSC_VER) # define _buf_vsnprintf _vsnprintf #else # define _buf_vsnprintf vsnprintf #endif int bufprefix(const struct buf *buf, const char *prefix) { size_t i; assert(buf && buf->unit); for (i = 0; i < buf->size; ++i) { if (prefix[i] == 0) return 0; if (buf->data[i] != prefix[i]) return buf->data[i] - prefix[i]; } return 0; } /* bufgrow: increasing the allocated size to the given value */ int bufgrow(struct buf *buf, size_t neosz) { size_t neoasz; void *neodata; assert(buf && buf->unit); if (neosz > BUFFER_MAX_ALLOC_SIZE) return BUF_ENOMEM; if (buf->asize >= neosz) return BUF_OK; neoasz = buf->asize + buf->unit; while (neoasz < neosz) neoasz += buf->unit; neodata = realloc(buf->data, neoasz); if (!neodata) return BUF_ENOMEM; buf->data = neodata; buf->asize = neoasz; return BUF_OK; } /* bufnew: allocation of a new buffer */ struct buf * bufnew(size_t unit) { struct buf *ret; ret = malloc(sizeof (struct buf)); if (ret) { ret->data = 0; ret->size = ret->asize = 0; ret->unit = unit; } return ret; } /* bufnullterm: NULL-termination of the string array */ const char * bufcstr(struct buf *buf) { assert(buf && buf->unit); if (buf->size < buf->asize && buf->data[buf->size] == 0) return (char *)buf->data; if (buf->size + 1 <= buf->asize || bufgrow(buf, buf->size + 1) == 0) { buf->data[buf->size] = 0; return (char *)buf->data; } return NULL; } /* bufprintf: formatted printing to a buffer */ void bufprintf(struct buf *buf, const char *fmt, ...) { va_list ap; int n; assert(buf && buf->unit); if (buf->size >= buf->asize && bufgrow(buf, buf->size + 1) < 0) return; va_start(ap, fmt); n = _buf_vsnprintf((char *)buf->data + buf->size, buf->asize - buf->size, fmt, ap); va_end(ap); if (n < 0) { #ifdef _MSC_VER va_start(ap, fmt); n = _vscprintf(fmt, ap); va_end(ap); #else return; #endif } if ((size_t)n >= buf->asize - buf->size) { if (bufgrow(buf, buf->size + n + 1) < 0) return; va_start(ap, fmt); n = _buf_vsnprintf((char *)buf->data + buf->size, buf->asize - buf->size, fmt, ap); va_end(ap); } if (n < 0) return; buf->size += n; } /* bufput: appends raw data to a buffer */ void bufput(struct buf *buf, const void *data, size_t len) { assert(buf && buf->unit); if (buf->size + len > buf->asize && bufgrow(buf, buf->size + len) < 0) return; memcpy(buf->data + buf->size, data, len); buf->size += len; } /* bufputs: appends a NUL-terminated string to a buffer */ void bufputs(struct buf *buf, const char *str) { bufput(buf, str, strlen(str)); } /* bufputc: appends a single uint8_t to a buffer */ void bufputc(struct buf *buf, int c) { assert(buf && buf->unit); if (buf->size + 1 > buf->asize && bufgrow(buf, buf->size + 1) < 0) return; buf->data[buf->size] = c; buf->size += 1; } /* bufrelease: decrease the reference count and free the buffer if needed */ void bufrelease(struct buf *buf) { if (!buf) return; free(buf->data); free(buf); } /* bufreset: frees internal data of the buffer */ void bufreset(struct buf *buf) { if (!buf) return; free(buf->data); buf->data = NULL; buf->size = buf->asize = 0; } /* bufslurp: removes a given number of bytes from the head of the array */ void bufslurp(struct buf *buf, size_t len) { assert(buf && buf->unit); if (len >= buf->size) { buf->size = 0; return; } buf->size -= len; memmove(buf->data, buf->data + len, buf->size); } rinku-2.0.6/ext/rinku/buffer.h000066400000000000000000000056311345745603300162400ustar00rootroot00000000000000/* * Copyright (c) 2008, Natacha Porté * Copyright (c) 2011, Vicent Martí * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #ifndef BUFFER_H__ #define BUFFER_H__ #include #include #include #ifdef __cplusplus extern "C" { #endif #if defined(_MSC_VER) #define __attribute__(x) #define inline #endif typedef enum { BUF_OK = 0, BUF_ENOMEM = -1, } buferror_t; /* struct buf: character array buffer */ struct buf { uint8_t *data; /* actual character data */ size_t size; /* size of the string */ size_t asize; /* allocated size (0 = volatile buffer) */ size_t unit; /* reallocation unit size (0 = read-only buffer) */ }; /* CONST_BUF: global buffer from a string litteral */ #define BUF_STATIC(string) \ { (uint8_t *)string, sizeof string -1, sizeof string, 0, 0 } /* VOLATILE_BUF: macro for creating a volatile buffer on the stack */ #define BUF_VOLATILE(strname) \ { (uint8_t *)strname, strlen(strname), 0, 0, 0 } /* BUFPUTSL: optimized bufputs of a string litteral */ #define BUFPUTSL(output, literal) \ bufput(output, literal, sizeof literal - 1) /* bufgrow: increasing the allocated size to the given value */ int bufgrow(struct buf *, size_t); /* bufnew: allocation of a new buffer */ struct buf *bufnew(size_t) __attribute__ ((malloc)); /* bufnullterm: NUL-termination of the string array (making a C-string) */ const char *bufcstr(struct buf *); /* bufprefix: compare the beginning of a buffer with a string */ int bufprefix(const struct buf *buf, const char *prefix); /* bufput: appends raw data to a buffer */ void bufput(struct buf *, const void *, size_t); /* bufputs: appends a NUL-terminated string to a buffer */ void bufputs(struct buf *, const char *); /* bufputc: appends a single char to a buffer */ void bufputc(struct buf *, int); /* bufrelease: decrease the reference count and free the buffer if needed */ void bufrelease(struct buf *); /* bufreset: frees internal data of the buffer */ void bufreset(struct buf *); /* bufslurp: removes a given number of bytes from the head of the array */ void bufslurp(struct buf *, size_t); /* bufprintf: formatted printing to a buffer */ void bufprintf(struct buf *, const char *, ...) __attribute__ ((format (printf, 2, 3))); #ifdef __cplusplus } #endif #endif rinku-2.0.6/ext/rinku/extconf.rb000066400000000000000000000001401345745603300165770ustar00rootroot00000000000000require 'mkmf' $CFLAGS += ' -fvisibility=hidden' dir_config('rinku') create_makefile('rinku') rinku-2.0.6/ext/rinku/rinku.c000066400000000000000000000117231345745603300161110ustar00rootroot00000000000000/* * Copyright (c) 2016, GitHub, Inc * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include #include #include #include #include "rinku.h" #include "autolink.h" #include "buffer.h" #include "utf8.h" typedef enum { HTML_TAG_NONE = 0, HTML_TAG_OPEN, HTML_TAG_CLOSE, } html_tag; typedef enum { AUTOLINK_ACTION_NONE = 0, AUTOLINK_ACTION_WWW, AUTOLINK_ACTION_EMAIL, AUTOLINK_ACTION_URL, AUTOLINK_ACTION_SKIP_TAG } autolink_action; typedef bool (*autolink_parse_cb)( struct autolink_pos *, const uint8_t *, size_t, size_t, unsigned int); static autolink_parse_cb g_callbacks[] = { NULL, autolink__www, /* 1 */ autolink__email,/* 2 */ autolink__url, /* 3 */ }; static const char *g_hrefs[] = { NULL, " org) bufput(ob, link + org, i - org); if (i >= size) break; BUFPUTSL(ob, """); i++; } } /* From sundown/html/html.c */ static int html_is_tag(const uint8_t *tag_data, size_t tag_size, const char *tagname) { size_t i; int closed = 0; if (tag_size < 3 || tag_data[0] != '<') return HTML_TAG_NONE; i = 1; if (tag_data[i] == '/') { closed = 1; i++; } for (; i < tag_size; ++i, ++tagname) { if (*tagname == 0) break; if (tag_data[i] != *tagname) return HTML_TAG_NONE; } if (i == tag_size) return HTML_TAG_NONE; if (rinku_isspace(tag_data[i]) || tag_data[i] == '>') return closed ? HTML_TAG_CLOSE : HTML_TAG_OPEN; return HTML_TAG_NONE; } static size_t autolink__skip_tag( struct buf *ob, const uint8_t *text, size_t size, const char **skip_tags) { size_t i = 0; while (i < size && text[i] != '>') i++; while (*skip_tags != NULL) { if (html_is_tag(text, size, *skip_tags) == HTML_TAG_OPEN) break; skip_tags++; } if (*skip_tags != NULL) { for (;;) { while (i < size && text[i] != '<') i++; if (i == size) break; if (html_is_tag(text + i, size - i, *skip_tags) == HTML_TAG_CLOSE) break; i++; } while (i < size && text[i] != '>') i++; } return i; } int rinku_autolink( struct buf *ob, const uint8_t *text, size_t size, autolink_mode mode, unsigned int flags, const char *link_attr, const char **skip_tags, void (*link_text_cb)(struct buf *, const uint8_t *, size_t, void *), void *payload) { size_t i, end; char active_chars[256] = {0}; int link_count = 0; if (!text || size == 0) return 0; active_chars['<'] = AUTOLINK_ACTION_SKIP_TAG; if (mode & AUTOLINK_EMAILS) active_chars['@'] = AUTOLINK_ACTION_EMAIL; if (mode & AUTOLINK_URLS) { active_chars['w'] = AUTOLINK_ACTION_WWW; active_chars['W'] = AUTOLINK_ACTION_WWW; active_chars[':'] = AUTOLINK_ACTION_URL; } if (link_attr != NULL) { while (rinku_isspace(*link_attr)) link_attr++; } bufgrow(ob, size); i = end = 0; while (i < size) { struct autolink_pos link; bool link_found; char action = 0; while (end < size && (action = active_chars[text[end]]) == 0) end++; if (end == size) { if (link_count > 0) bufput(ob, text + i, end - i); break; } if (action == AUTOLINK_ACTION_SKIP_TAG) { end += autolink__skip_tag(ob, text + end, size - end, skip_tags); continue; } link_found = g_callbacks[(int)action]( &link, text, end, size, flags); if (link_found && link.start >= i) { const uint8_t *link_str = text + link.start; const size_t link_len = link.end - link.start; bufput(ob, text + i, link.start - i); bufputs(ob, g_hrefs[(int)action]); print_link(ob, link_str, link_len); if (link_attr) { BUFPUTSL(ob, "\" "); bufputs(ob, link_attr); bufputc(ob, '>'); } else { BUFPUTSL(ob, "\">"); } if (link_text_cb) { link_text_cb(ob, link_str, link_len, payload); } else { bufput(ob, link_str, link_len); } BUFPUTSL(ob, ""); link_count++; end = i = link.end; } else { end = end + 1; } } return link_count; } rinku-2.0.6/ext/rinku/rinku.h000066400000000000000000000007271345745603300161200ustar00rootroot00000000000000#ifndef _RINKU_H #define _RINKU_H #include #include "buffer.h" typedef enum { AUTOLINK_URLS = (1 << 0), AUTOLINK_EMAILS = (1 << 1), AUTOLINK_ALL = AUTOLINK_URLS|AUTOLINK_EMAILS } autolink_mode; int rinku_autolink( struct buf *ob, const uint8_t *text, size_t size, autolink_mode mode, unsigned int flags, const char *link_attr, const char **skip_tags, void (*link_text_cb)(struct buf *, const uint8_t *, size_t, void *), void *payload); #endif rinku-2.0.6/ext/rinku/rinku_rb.c000066400000000000000000000157541345745603300166040ustar00rootroot00000000000000/* * Copyright (c) 2016, GitHub, Inc * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include #define RUBY_EXPORT __attribute__ ((visibility ("default"))) #include #include #include "rinku.h" #include "autolink.h" static VALUE rb_mRinku; struct callback_data { VALUE rb_block; rb_encoding *encoding; }; static rb_encoding * validate_encoding(VALUE rb_str) { rb_encoding *encoding; Check_Type(rb_str, T_STRING); encoding = rb_enc_get(rb_str); if (!rb_enc_asciicompat(encoding)) rb_raise(rb_eArgError, "Invalid encoding"); if (rb_enc_str_coderange(rb_str) == ENC_CODERANGE_BROKEN) rb_raise(rb_eArgError, "invalid byte sequence in %s", rb_enc_name(encoding)); return encoding; } static void autolink_callback(struct buf *link_text, const uint8_t *url, size_t url_len, void *block) { struct callback_data *data = block; VALUE rb_link, rb_link_text; rb_link = rb_enc_str_new((const char *)url, url_len, data->encoding); rb_link_text = rb_funcall(data->rb_block, rb_intern("call"), 1, rb_link); if (validate_encoding(rb_link_text) != data->encoding) rb_raise(rb_eArgError, "encoding mismatch"); bufput(link_text, RSTRING_PTR(rb_link_text), RSTRING_LEN(rb_link_text)); } const char **rinku_load_tags(VALUE rb_skip) { const char **skip_tags; size_t i, count; Check_Type(rb_skip, T_ARRAY); count = RARRAY_LEN(rb_skip); skip_tags = xmalloc(sizeof(void *) * (count + 1)); for (i = 0; i < count; ++i) { VALUE tag = rb_ary_entry(rb_skip, i); Check_Type(tag, T_STRING); skip_tags[i] = StringValueCStr(tag); } skip_tags[count] = NULL; return skip_tags; } /* * Document-method: auto_link * * call-seq: * auto_link(text, mode=:all, link_attr=nil, skip_tags=nil, flags=0) * auto_link(text, mode=:all, link_attr=nil, skip_tags=nil, flags=0) { |link_text| ... } * * Parses a block of text looking for "safe" urls or email addresses, * and turns them into HTML links with the given attributes. * * NOTE: The block of text may or may not be HTML; if the text is HTML, * Rinku will skip the relevant tags to prevent double-linking and linking * inside `pre` blocks by default. * * NOTE: If the input text is HTML, it's expected to be already escaped. * Rinku will perform no escaping. * * NOTE: Currently the follow protocols are considered safe and are the * only ones that will be autolinked. * * http:// https:// ftp:// mailto:// * * Email addresses are also autolinked by default. URLs without a protocol * specifier but starting with 'www.' will also be autolinked, defaulting to * the 'http://' protocol. * * - `text` is a string in plain text or HTML markup. If the string is formatted in * HTML, Rinku is smart enough to skip the links that are already enclosed in `` * tags.` * * - `mode` is a symbol, either `:all`, `:urls` or `:email_addresses`, * which specifies which kind of links will be auto-linked. * * - `link_attr` is a string containing the link attributes for each link that * will be generated. These attributes are not sanitized and will be include as-is * in each generated link, e.g. * * ~~~~~ruby * auto_link('http://www.pokemon.com', :all, 'target="_blank"') * # => 'http://www.pokemon.com' * ~~~~~ * * This string can be autogenerated from a hash using the Rails `tag_options` helper. * * - `skip_tags` is a list of strings with the names of HTML tags that will be skipped * when autolinking. If `nil`, this defaults to the value of the global `Rinku.skip_tags`, * which is initially `["a", "pre", "code", "kbd", "script"]`. * * - `flag` is an optional boolean value specifying whether to recognize * 'http://foo' as a valid domain, or require at least one '.'. It defaults to false. * * - `&block` is an optional block argument. If a block is passed, it will * be yielded for each found link in the text, and its return value will be used instead * of the name of the link. E.g. * * ~~~~~ruby * auto_link('Check it out at http://www.pokemon.com') do |url| * "THE POKEMAN WEBSITEZ" * end * # => 'Check it out at THE POKEMAN WEBSITEZ' * ~~~~~~ */ static VALUE rb_rinku_autolink(int argc, VALUE *argv, VALUE self) { static const char *SKIP_TAGS[] = {"a", "pre", "code", "kbd", "script", NULL}; VALUE result, rb_text, rb_mode, rb_html, rb_skip, rb_flags, rb_block; rb_encoding *text_encoding; struct buf *output_buf; int link_mode = AUTOLINK_ALL, count; unsigned int link_flags = 0; const char *link_attr = NULL; const char **skip_tags = NULL; struct callback_data cbdata; rb_scan_args(argc, argv, "14&", &rb_text, &rb_mode, &rb_html, &rb_skip, &rb_flags, &rb_block); text_encoding = validate_encoding(rb_text); if (!NIL_P(rb_mode)) { ID mode_sym; Check_Type(rb_mode, T_SYMBOL); mode_sym = SYM2ID(rb_mode); if (mode_sym == rb_intern("all")) link_mode = AUTOLINK_ALL; else if (mode_sym == rb_intern("email_addresses")) link_mode = AUTOLINK_EMAILS; else if (mode_sym == rb_intern("urls")) link_mode = AUTOLINK_URLS; else rb_raise(rb_eTypeError, "Invalid linking mode " "(possible values are :all, :urls, :email_addresses)"); } if (!NIL_P(rb_html)) { Check_Type(rb_html, T_STRING); link_attr = RSTRING_PTR(rb_html); } if (!NIL_P(rb_flags)) { Check_Type(rb_flags, T_FIXNUM); link_flags = FIX2INT(rb_flags); } if (NIL_P(rb_skip)) rb_skip = rb_iv_get(self, "@skip_tags"); if (NIL_P(rb_skip)) { skip_tags = SKIP_TAGS; } else { skip_tags = rinku_load_tags(rb_skip); } output_buf = bufnew(32); cbdata.rb_block = rb_block; cbdata.encoding = text_encoding; count = rinku_autolink( output_buf, (const uint8_t *)RSTRING_PTR(rb_text), (size_t)RSTRING_LEN(rb_text), link_mode, link_flags, link_attr, skip_tags, RTEST(rb_block) ? &autolink_callback : NULL, (void*)&cbdata); if (count == 0) result = rb_text; else { result = rb_enc_str_new((char *)output_buf->data, output_buf->size, text_encoding); } if (skip_tags != SKIP_TAGS) xfree(skip_tags); bufrelease(output_buf); return result; } void RUBY_EXPORT Init_rinku() { rb_mRinku = rb_define_module("Rinku"); rb_define_module_function(rb_mRinku, "auto_link", rb_rinku_autolink, -1); rb_define_const(rb_mRinku, "AUTOLINK_SHORT_DOMAINS", INT2FIX(AUTOLINK_SHORT_DOMAINS)); } rinku-2.0.6/ext/rinku/utf8.c000066400000000000000000000212401345745603300156420ustar00rootroot00000000000000#include #include #include #include #include "utf8.h" /** 1 = space, 2 = punct, 3 = digit, 4 = alpha, 0 = other */ static const uint8_t ctype_class[256] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, /* 1 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 2 */ 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, /* 3 */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, /* 4 */ 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, /* 5 */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, /* 6 */ 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, /* 7 */ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 0, /* 8 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 9 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* a */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* b */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* c */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* d */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* e */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* f */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; bool rinku_isspace(char c) { return ctype_class[(uint8_t)c] == 1; } bool rinku_ispunct(char c) { return ctype_class[(uint8_t)c] == 2; } bool rinku_isdigit(char c) { return ctype_class[(uint8_t)c] == 3; } bool rinku_isalpha(char c) { return ctype_class[(uint8_t)c] == 4; } bool rinku_isalnum(char c) { uint8_t cls = ctype_class[(uint8_t)c]; return (cls == 3 || cls == 4); } static const int8_t utf8proc_utf8class[256] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0}; static int32_t read_cp(const uint8_t *str, int8_t length) { switch (length) { case 1: return str[0]; case 2: return ((str[0] & 0x1F) << 6) + (str[1] & 0x3F); case 3: return ((str[0] & 0x0F) << 12) + ((str[1] & 0x3F) << 6) + (str[2] & 0x3F); case 4: return ((str[0] & 0x07) << 18) + ((str[1] & 0x3F) << 12) + ((str[2] & 0x3F) << 6) + (str[3] & 0x3F); default: return 0xFFFD; // replacement character } } int32_t utf8proc_next(const uint8_t *str, size_t *pos) { const size_t p = *pos; const int8_t length = utf8proc_utf8class[str[p]]; (*pos) += length; return read_cp(str + p, length); } int32_t utf8proc_back(const uint8_t *str, size_t *pos) { const size_t p = *pos; int8_t length = 0; if (!p) return 0x0; if ((str[p - 1] & 0x80) == 0x0) { (*pos) -= 1; return str[p - 1]; } if (p > 1 && utf8proc_utf8class[str[p - 2]] == 2) length = 2; else if (p > 2 && utf8proc_utf8class[str[p - 3]] == 3) length = 3; else if (p > 3 && utf8proc_utf8class[str[p - 4]] == 4) length = 4; (*pos) -= length; return read_cp(&str[*pos], length); } size_t utf8proc_find_space(const uint8_t *str, size_t pos, size_t size) { while (pos < size) { const size_t last = pos; int32_t uc = utf8proc_next(str, &pos); if (uc == 0xFFFD) return size; else if (utf8proc_is_space(uc)) return last; } return size; } int32_t utf8proc_rewind(const uint8_t *data, size_t pos) { int8_t length = 0; if (!pos) return 0x0; if ((data[pos - 1] & 0x80) == 0x0) return data[pos - 1]; if (pos > 1 && utf8proc_utf8class[data[pos - 2]] == 2) length = 2; else if (pos > 2 && utf8proc_utf8class[data[pos - 3]] == 3) length = 3; else if (pos > 3 && utf8proc_utf8class[data[pos - 4]] == 4) length = 4; return read_cp(&data[pos - length], length); } int32_t utf8proc_open_paren_character(int32_t cclose) { switch (cclose) { case '"': return '"'; case '\'': return '\''; case ')': return '('; case ']': return '['; case '}': return '{'; case 65289: return 65288; /* () */ case 12305: return 12304; /* 【】 */ case 12303: return 12302; /* 『』 */ case 12301: return 12300; /* 「」 */ case 12299: return 12298; /* 《》 */ case 12297: return 12296; /* 〈〉 */ } return 0; } bool utf8proc_is_space(int32_t uc) { return (uc == 9 || uc == 10 || uc == 12 || uc == 13 || uc == 32 || uc == 160 || uc == 5760 || (uc >= 8192 && uc <= 8202) || uc == 8239 || uc == 8287 || uc == 12288); } bool utf8proc_is_punctuation(int32_t uc) { if (uc < 128) return rinku_ispunct(uc); return (uc == 161 || uc == 167 || uc == 171 || uc == 182 || uc == 183 || uc == 187 || uc == 191 || uc == 894 || uc == 903 || (uc >= 1370 && uc <= 1375) || uc == 1417 || uc == 1418 || uc == 1470 || uc == 1472 || uc == 1475 || uc == 1478 || uc == 1523 || uc == 1524 || uc == 1545 || uc == 1546 || uc == 1548 || uc == 1549 || uc == 1563 || uc == 1566 || uc == 1567 || (uc >= 1642 && uc <= 1645) || uc == 1748 || (uc >= 1792 && uc <= 1805) || (uc >= 2039 && uc <= 2041) || (uc >= 2096 && uc <= 2110) || uc == 2142 || uc == 2404 || uc == 2405 || uc == 2416 || uc == 2800 || uc == 3572 || uc == 3663 || uc == 3674 || uc == 3675 || (uc >= 3844 && uc <= 3858) || uc == 3860 || (uc >= 3898 && uc <= 3901) || uc == 3973 || (uc >= 4048 && uc <= 4052) || uc == 4057 || uc == 4058 || (uc >= 4170 && uc <= 4175) || uc == 4347 || (uc >= 4960 && uc <= 4968) || uc == 5120 || uc == 5741 || uc == 5742 || uc == 5787 || uc == 5788 || (uc >= 5867 && uc <= 5869) || uc == 5941 || uc == 5942 || (uc >= 6100 && uc <= 6102) || (uc >= 6104 && uc <= 6106) || (uc >= 6144 && uc <= 6154) || uc == 6468 || uc == 6469 || uc == 6686 || uc == 6687 || (uc >= 6816 && uc <= 6822) || (uc >= 6824 && uc <= 6829) || (uc >= 7002 && uc <= 7008) || (uc >= 7164 && uc <= 7167) || (uc >= 7227 && uc <= 7231) || uc == 7294 || uc == 7295 || (uc >= 7360 && uc <= 7367) || uc == 7379 || (uc >= 8208 && uc <= 8231) || (uc >= 8240 && uc <= 8259) || (uc >= 8261 && uc <= 8273) || (uc >= 8275 && uc <= 8286) || uc == 8317 || uc == 8318 || uc == 8333 || uc == 8334 || (uc >= 8968 && uc <= 8971) || uc == 9001 || uc == 9002 || (uc >= 10088 && uc <= 10101) || uc == 10181 || uc == 10182 || (uc >= 10214 && uc <= 10223) || (uc >= 10627 && uc <= 10648) || (uc >= 10712 && uc <= 10715) || uc == 10748 || uc == 10749 || (uc >= 11513 && uc <= 11516) || uc == 11518 || uc == 11519 || uc == 11632 || (uc >= 11776 && uc <= 11822) || (uc >= 11824 && uc <= 11842) || (uc >= 12289 && uc <= 12291) || (uc >= 12296 && uc <= 12305) || (uc >= 12308 && uc <= 12319) || uc == 12336 || uc == 12349 || uc == 12448 || uc == 12539 || uc == 42238 || uc == 42239 || (uc >= 42509 && uc <= 42511) || uc == 42611 || uc == 42622 || (uc >= 42738 && uc <= 42743) || (uc >= 43124 && uc <= 43127) || uc == 43214 || uc == 43215 || (uc >= 43256 && uc <= 43258) || uc == 43310 || uc == 43311 || uc == 43359 || (uc >= 43457 && uc <= 43469) || uc == 43486 || uc == 43487 || (uc >= 43612 && uc <= 43615) || uc == 43742 || uc == 43743 || uc == 43760 || uc == 43761 || uc == 44011 || uc == 64830 || uc == 64831 || (uc >= 65040 && uc <= 65049) || (uc >= 65072 && uc <= 65106) || (uc >= 65108 && uc <= 65121) || uc == 65123 || uc == 65128 || uc == 65130 || uc == 65131 || (uc >= 65281 && uc <= 65283) || (uc >= 65285 && uc <= 65290) || (uc >= 65292 && uc <= 65295) || uc == 65306 || uc == 65307 || uc == 65311 || uc == 65312 || (uc >= 65339 && uc <= 65341) || uc == 65343 || uc == 65371 || uc == 65373 || (uc >= 65375 && uc <= 65381) || (uc >= 65792 && uc <= 65794) || uc == 66463 || uc == 66512 || uc == 66927 || uc == 67671 || uc == 67871 || uc == 67903 || (uc >= 68176 && uc <= 68184) || uc == 68223 || (uc >= 68336 && uc <= 68342) || (uc >= 68409 && uc <= 68415) || (uc >= 68505 && uc <= 68508) || (uc >= 69703 && uc <= 69709) || uc == 69819 || uc == 69820 || (uc >= 69822 && uc <= 69825) || (uc >= 69952 && uc <= 69955) || uc == 70004 || uc == 70005 || (uc >= 70085 && uc <= 70088) || uc == 70093 || (uc >= 70200 && uc <= 70205) || uc == 70854 || (uc >= 71105 && uc <= 71113) || (uc >= 71233 && uc <= 71235) || (uc >= 74864 && uc <= 74868) || uc == 92782 || uc == 92783 || uc == 92917 || (uc >= 92983 && uc <= 92987) || uc == 92996 || uc == 113823); } rinku-2.0.6/ext/rinku/utf8.h000066400000000000000000000025471345745603300156600ustar00rootroot00000000000000/* * Copyright (c) 2016, GitHub, Inc * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #ifndef RINKU_UTF8_H #define RINKU_UTF8_H #include #include bool rinku_isspace(char c); bool rinku_ispunct(char c); bool rinku_isdigit(char c); bool rinku_isalpha(char c); bool rinku_isalnum(char c); int32_t utf8proc_rewind(const uint8_t *data, size_t pos); int32_t utf8proc_next(const uint8_t *str, size_t *pos); int32_t utf8proc_back(const uint8_t *data, size_t *pos); size_t utf8proc_find_space(const uint8_t *str, size_t pos, size_t size); int32_t utf8proc_open_paren_character(int32_t cclose); bool utf8proc_is_space(int32_t uc); bool utf8proc_is_punctuation(int32_t uc); #endif rinku-2.0.6/lib/000077500000000000000000000000001345745603300134275ustar00rootroot00000000000000rinku-2.0.6/lib/rails_rinku.rb000066400000000000000000000015651345745603300163050ustar00rootroot00000000000000require 'rinku' module RailsRinku def rinku_auto_link(text, *args, &block) return '' if text.blank? options = args.size == 2 ? {} : args.extract_options! unless args.empty? options[:link] = args[0] || :all options[:html] = args[1] || {} options[:skip] = args[2] end options.reverse_merge!(:link => :all, :html => {}) text = h(text) unless text.html_safe? tag_options_method = if Gem::Version.new(Rails.version) >= Gem::Version.new("5.1") # Rails >= 5.1 tag_builder.method(:tag_options) else # Rails <= 5.0 method(:tag_options) end Rinku.auto_link( text, options[:link], tag_options_method.call(options[:html]), options[:skip], &block ).html_safe end end module ActionView::Helpers::TextHelper include RailsRinku alias_method :auto_link, :rinku_auto_link end rinku-2.0.6/lib/rinku.rb000066400000000000000000000002051345745603300151010ustar00rootroot00000000000000module Rinku VERSION = "2.0.6" class << self attr_accessor :skip_tags end self.skip_tags = nil end require 'rinku.so' rinku-2.0.6/rinku.gemspec000066400000000000000000000021361345745603300153600ustar00rootroot00000000000000Gem::Specification.new do |s| s.name = 'rinku' s.version = '2.0.6' s.summary = "Mostly autolinking" s.description = <<-EOF A fast and very smart autolinking library that acts as a drop-in replacement for Rails `auto_link` EOF s.email = 'vicent@github.com' s.homepage = 'https://github.com/vmg/rinku' s.authors = ["Vicent Marti"] s.license = 'ISC' # = MANIFEST = s.files = %w[ COPYING README.markdown Rakefile ext/rinku/autolink.c ext/rinku/autolink.h ext/rinku/buffer.c ext/rinku/buffer.h ext/rinku/extconf.rb ext/rinku/rinku.c ext/rinku/rinku.h ext/rinku/rinku_rb.c ext/rinku/utf8.c ext/rinku/utf8.h lib/rails_rinku.rb lib/rinku.rb rinku.gemspec test/autolink_test.rb ] # = MANIFEST = s.test_files = ["test/autolink_test.rb"] s.extra_rdoc_files = ["COPYING"] s.extensions = ["ext/rinku/extconf.rb"] s.require_paths = ["lib"] s.add_development_dependency "rake" s.add_development_dependency "rake-compiler" s.add_development_dependency "minitest", ">= 5.0" s.required_ruby_version = '>= 2.0.0' end rinku-2.0.6/test/000077500000000000000000000000001345745603300136405ustar00rootroot00000000000000rinku-2.0.6/test/autolink_test.rb000066400000000000000000000456351345745603300170670ustar00rootroot00000000000000require 'bundler/setup' $LOAD_PATH.unshift File.expand_path('../../lib', __FILE__) require 'minitest/autorun' require 'cgi' require 'uri' require 'rinku' class RinkuAutoLinkTest < Minitest::Test def generate_result(link_text, href = nil) href ||= link_text href = "http://" + href unless href =~ %r{\A(\w+://|mailto:)} %{#{CGI.escapeHTML link_text}} end def assert_linked(expected, url) assert_equal expected, Rinku.auto_link(url) end def test_segfault Rinku.auto_link("a+b@d.com+e@f.com", :all) end def test_escapes_quotes assert_linked %(http://website.com/"onmouseover=document.body.style.backgroundColor="pink";//), %(http://website.com/"onmouseover=document.body.style.backgroundColor="pink";//) end def test_global_skip_tags assert_nil Rinku.skip_tags Rinku.skip_tags = ['pre'] assert_equal Rinku.skip_tags, ['pre'] Rinku.skip_tags = ['pa'] url = 'This is just a http://www.pokemon.com test' assert_equal Rinku.auto_link(url), url Rinku.skip_tags = nil refute_equal Rinku.auto_link(url), url end def test_auto_link_with_single_trailing_punctuation_and_space url = "http://www.youtube.com" url_result = generate_result(url) assert_equal url_result, Rinku.auto_link(url) ["?", "!", ".", ",", ":"].each do |punc| assert_equal "link: #{url_result}#{punc} foo?", Rinku.auto_link("link: #{url}#{punc} foo?") end end def test_terminates_on_ampersand url = "http://example.com" assert_linked "hello '#{url}' hello", "hello '#{url}' hello" end def test_does_not_segfault assert_linked "< this is just a test", "< this is just a test" end def test_skips_tags html = <<-html This is just a test. http://www.pokemon.com
More test http://www.amd.com
  CODE www.less.es
html result = <<-result This is just a test. http://www.pokemon.com
More test http://www.amd.com
  CODE www.less.es
result assert_equal result, Rinku.auto_link(html, :all, nil, ["div", "a"]) end def test_auto_link_with_brackets link1_raw = 'http://en.wikipedia.org/wiki/Sprite_(computer_graphics)' link1_result = generate_result(link1_raw) assert_equal link1_result, Rinku.auto_link(link1_raw) assert_equal "(link: #{link1_result})", Rinku.auto_link("(link: #{link1_raw})") link2_raw = 'http://en.wikipedia.org/wiki/Sprite_[computer_graphics]' link2_result = generate_result(link2_raw) assert_equal link2_result, Rinku.auto_link(link2_raw) assert_equal "[link: #{link2_result}]", Rinku.auto_link("[link: #{link2_raw}]") link3_raw = 'http://en.wikipedia.org/wiki/Sprite_{computer_graphics}' link3_result = generate_result(link3_raw) assert_equal link3_result, Rinku.auto_link(link3_raw) assert_equal "{link: #{link3_result}}", Rinku.auto_link("{link: #{link3_raw}}") end def test_auto_link_with_multiple_trailing_punctuations url = "http://youtube.com" url_result = generate_result(url) assert_equal url_result, Rinku.auto_link(url) assert_equal "(link: #{url_result}).", Rinku.auto_link("(link: #{url}).") end def test_auto_link_with_block url = "http://api.rubyonrails.com/Foo.html" email = "fantabulous@shiznadel.ic" assert_equal %(

#{url[0...7]}...
#{email[0...7]}...

), Rinku.auto_link("

#{url}
#{email}

") { |_url| _url[0...7] + '...'} end def test_auto_link_with_block_with_html pic = "http://example.com/pic.png" url = "http://example.com/album?a&b=c" expect = %(My pic: -- full album here #{generate_result(url)}) text = "My pic: #{pic} -- full album here #{CGI.escapeHTML url}" assert_equal expect, Rinku.auto_link(text) { |link| if link =~ /\.(jpg|gif|png|bmp|tif)$/i %() else link end } end def test_auto_link_already_linked linked1 = generate_result('Ruby On Rails', 'http://www.rubyonrails.com') linked2 = %('www.example.com') linked3 = %('www.example.com') linked4 = %('www.example.com') linked5 = %('close www.example.com') assert_equal linked1, Rinku.auto_link(linked1) assert_equal linked2, Rinku.auto_link(linked2) assert_equal linked3, Rinku.auto_link(linked3) assert_equal linked4, Rinku.auto_link(linked4) assert_equal linked5, Rinku.auto_link(linked5) linked_email = %Q(Mail me) assert_equal linked_email, Rinku.auto_link(linked_email) end def test_auto_link_at_eol url1 = "http://api.rubyonrails.com/Foo.html" url2 = "http://www.ruby-doc.org/core/Bar.html" assert_equal %(

#{url1}
#{url2}

), Rinku.auto_link("

#{url1}
#{url2}

") end def test_block link = Rinku.auto_link("Find ur favorite pokeman @ http://www.pokemon.com") do |url| assert_equal url, "http://www.pokemon.com" "POKEMAN WEBSITE" end assert_equal link, "Find ur favorite pokeman @ POKEMAN WEBSITE" end def test_autolink_works url = "http://example.com/" assert_linked "#{url}", url end def test_autolink_options_for_short_domains url = "http://google" linked_url = "#{url}" flags = Rinku::AUTOLINK_SHORT_DOMAINS # Specifying use short_domains in the args url = "http://google" linked_url = "#{url}" assert_equal Rinku.auto_link(url, nil, nil, nil, flags), linked_url # Specifying no short_domains in the args url = "http://google" linked_url = "#{url}" assert_equal Rinku.auto_link(url, nil, nil, nil, 0), url end def test_not_autolink_www assert_linked "Awww... man", "Awww... man" end def test_does_not_terminate_on_dash url = "http://example.com/Notification_Center-GitHub-20101108-140050.jpg" assert_linked "#{url}", url end def test_does_not_include_trailing_gt url = "http://example.com" assert_linked "<#{url}>", "<#{url}>" end def test_links_with_anchors url = "https://github.com/github/hubot/blob/master/scripts/cream.js#L20-20" assert_linked "#{url}", url end def test_links_like_rails urls = %w(http://www.rubyonrails.com http://www.rubyonrails.com:80 http://www.rubyonrails.com/~minam https://www.rubyonrails.com/~minam http://www.rubyonrails.com/~minam/url%20with%20spaces http://www.rubyonrails.com/foo.cgi?something=here http://www.rubyonrails.com/foo.cgi?something=here&and=here http://www.rubyonrails.com/contact;new http://www.rubyonrails.com/contact;new%20with%20spaces http://www.rubyonrails.com/contact;new?with=query&string=params http://www.rubyonrails.com/~minam/contact;new?with=query&string=params http://en.wikipedia.org/wiki/Wikipedia:Today%27s_featured_picture_%28animation%29/January_20%2C_2007 http://www.mail-archive.com/rails@lists.rubyonrails.org/ http://www.amazon.com/Testing-Equal-Sign-In-Path/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1198861734&sr=8-1 http://en.wikipedia.org/wiki/Sprite_(computer_graphics) http://en.wikipedia.org/wiki/Texas_hold%27em https://www.google.com/doku.php?id=gps:resource:scs:start ) urls.each do |url| assert_linked %(#{CGI.escapeHTML url}), CGI.escapeHTML(url) end end def test_links_like_autolink_rails email_raw = 'david@loudthinking.com' email_result = %{#{email_raw}} email2_raw = '+david@loudthinking.com' email2_result = %{#{email2_raw}} link_raw = 'http://www.rubyonrails.com' link_result = %{#{link_raw}} link2_raw = 'www.rubyonrails.com' link2_result = %{#{link2_raw}} link3_raw = 'http://manuals.ruby-on-rails.com/read/chapter.need_a-period/103#page281' link3_result = %{#{link3_raw}} link4_raw = CGI.escapeHTML 'http://foo.example.com/controller/action?parm=value&p2=v2#anchor123' link4_result = %{#{link4_raw}} link5_raw = 'http://foo.example.com:3000/controller/action' link5_result = %{#{link5_raw}} link6_raw = 'http://foo.example.com:3000/controller/action+pack' link6_result = %{#{link6_raw}} link7_raw = CGI.escapeHTML 'http://foo.example.com/controller/action?parm=value&p2=v2#anchor-123' link7_result = %{#{link7_raw}} link8_raw = 'http://foo.example.com:3000/controller/action.html' link8_result = %{#{link8_raw}} link9_raw = 'http://business.timesonline.co.uk/article/0,,9065-2473189,00.html' link9_result = %{#{link9_raw}} link10_raw = 'http://www.mail-archive.com/ruby-talk@ruby-lang.org/' link10_result = %{#{link10_raw}} assert_linked %(Go to #{link_result} and say hello to #{email_result}), "Go to #{link_raw} and say hello to #{email_raw}" assert_linked %(

Link #{link_result}

), "

Link #{link_raw}

" assert_linked %(

#{link_result} Link

), "

#{link_raw} Link

" assert_linked %(Go to #{link_result}.), %(Go to #{link_raw}.) assert_linked %(

Go to #{link_result}, then say hello to #{email_result}.

), %(

Go to #{link_raw}, then say hello to #{email_raw}.

) assert_linked %(

Link #{link2_result}

), "

Link #{link2_raw}

" assert_linked %(

#{link2_result} Link

), "

#{link2_raw} Link

" assert_linked %(Go to #{link2_result}.), %(Go to #{link2_raw}.) assert_linked %(

Say hello to #{email_result}, then go to #{link2_result},

), %(

Say hello to #{email_raw}, then go to #{link2_raw},

) assert_linked %(

Link #{link3_result}

), "

Link #{link3_raw}

" assert_linked %(

#{link3_result} Link

), "

#{link3_raw} Link

" assert_linked %(Go to #{link3_result}.), %(Go to #{link3_raw}.) assert_linked %(

Go to #{link3_result}. seriously, #{link3_result}? i think I'll say hello to #{email_result}. instead.

), %(

Go to #{link3_raw}. seriously, #{link3_raw}? i think I'll say hello to #{email_raw}. instead.

) assert_linked %(

Link #{link4_result}

), "

Link #{link4_raw}

" assert_linked %(

#{link4_result} Link

), "

#{link4_raw} Link

" assert_linked %(

#{link5_result} Link

), "

#{link5_raw} Link

" assert_linked %(

#{link6_result} Link

), "

#{link6_raw} Link

" assert_linked %(

#{link7_result} Link

), "

#{link7_raw} Link

" assert_linked %(

Link #{link8_result}

), "

Link #{link8_raw}

" assert_linked %(

#{link8_result} Link

), "

#{link8_raw} Link

" assert_linked %(Go to #{link8_result}.), %(Go to #{link8_raw}.) assert_linked %(

Go to #{link8_result}. seriously, #{link8_result}? i think I'll say hello to #{email_result}. instead.

), %(

Go to #{link8_raw}. seriously, #{link8_raw}? i think I'll say hello to #{email_raw}. instead.

) assert_linked %(

Link #{link9_result}

), "

Link #{link9_raw}

" assert_linked %(

#{link9_result} Link

), "

#{link9_raw} Link

" assert_linked %(Go to #{link9_result}.), %(Go to #{link9_raw}.) assert_linked %(

Go to #{link9_result}. seriously, #{link9_result}? i think I'll say hello to #{email_result}. instead.

), %(

Go to #{link9_raw}. seriously, #{link9_raw}? i think I'll say hello to #{email_raw}. instead.

) assert_linked %(

#{link10_result} Link

), "

#{link10_raw} Link

" assert_linked email2_result, email2_raw assert_linked "#{link_result} #{link_result} #{link_result}", "#{link_raw} #{link_raw} #{link_raw}" assert_linked 'Ruby On Rails', 'Ruby On Rails' end def test_copies_source_encoding str = "http://www.bash.org" ret = Rinku.auto_link str assert_equal str.encoding, ret.encoding str.encode! 'binary' ret = Rinku.auto_link str assert_equal str.encoding, ret.encoding end def test_valid_encodings_are_generated str = "<a href='http://gi.co'>gi.co</a>\xC2\xA0r" assert_equal Encoding::UTF_8, str.encoding res = Rinku.auto_link(str) assert_equal Encoding::UTF_8, res.encoding assert res.valid_encoding? end def test_polish_wikipedia_haha url = "https://pl.wikipedia.org/wiki/Komisja_śledcza_do_zbadania_sprawy_zarzutu_nielegalnego_wywierania_wpływu_na_funkcjonariuszy_policji,_służb_specjalnych,_prokuratorów_i_osoby_pełniące_funkcje_w_organach_wymiaru_sprawiedliwości" input = "A wikipedia link (#{url})" expected = "A wikipedia link (#{url})" assert_linked expected, input end def test_only_valid_encodings_are_accepted str = "this is invalid \xA0 utf8" assert_equal Encoding::UTF_8, str.encoding assert !str.valid_encoding? assert_raises ArgumentError do Rinku.auto_link(str) end end NBSP = "\xC2\xA0".freeze def test_the_famous_nbsp input = "at http://google.com/#{NBSP};" expected = "at http://google.com/#{NBSP};" assert_linked expected, input end def test_does_not_include_trailing_nonbreaking_spaces url = "http://example.com/" assert_linked "#{url}#{NBSP}and", "#{url}#{NBSP}and" end def test_identifies_preceeding_nonbreaking_spaces url = "http://example.com/" assert_linked "#{NBSP}#{url} and", "#{NBSP}#{url} and" end def test_urls_with_2_wide_UTF8_characters url = "http://example.com/?foo=¥&bar=1" assert_linked "#{url} and", "#{url} and" end def test_urls_with_4_wide_UTF8_characters url = "http://example.com/?foo=&bar=1" assert_linked "#{url} and", "#{url} and" end def test_handles_urls_with_emoji_properly url = "http://foo.com/💖a" assert_linked "#{url} and", "#{url} and" end def test_identifies_nonbreaking_spaces_preceeding_emails email_raw = 'david@loudthinking.com' assert_linked "email#{NBSP}#{email_raw}", "email#{NBSP}#{email_raw}" end def test_identifies_unicode_spaces assert_linked( %{This is just a test. http://www.pokemon.com\u202F\u2028\u2001}, "This is just a test. http://www.pokemon.com\u202F\u2028\u2001" ) end def test_www_is_case_insensitive url = "www.reddit.com" assert_linked generate_result(url), url url = "WWW.REDDIT.COM" assert_linked generate_result(url), url url = "Www.reddit.Com" assert_linked generate_result(url), url url = "WwW.reddit.CoM" assert_linked generate_result(url), url end def test_non_emails_ending_in_periods assert_linked "abc/def@ghi.", "abc/def@ghi." assert_linked "abc/def@ghi. ", "abc/def@ghi. " assert_linked "abc/def@ghi. x", "abc/def@ghi. x" assert_linked "abc/def@ghi.< x", "abc/def@ghi.< x" assert_linked "abc/def@ghi.x", "abc/def@ghi.x" assert_linked "abc/def@ghi.x. a", "abc/def@ghi.x. a" end def test_urls_with_entities_and_parens assert_linked "<http://www.google.com>", "<http://www.google.com>" assert_linked "<http://www.google.com>)", "<http://www.google.com>)" # this produces invalid output, but limits how much work we will do assert_linked "<http://www.google.com>)<)<)<)<)<)<)", "<http://www.google.com>)<)<)<)<)<)<)" url = "http://pokemon.com/bulbasaur" assert_linked "URL is #{generate_result(url)}.", "URL is #{url}." assert_linked "(URL is #{generate_result(url)}.)", "(URL is #{url}.)" url = "www.pokemon.com/bulbasaur" assert_linked "URL is #{generate_result(url)}.", "URL is #{url}." assert_linked "(URL is #{generate_result(url)}.)", "(URL is #{url}.)" url = "abc@xyz.com" assert_linked "URL is #{generate_result(url, "mailto:#{url}")}.", "URL is #{url}." assert_linked "(URL is #{generate_result(url, "mailto:#{url}")}.)", "(URL is #{url}.)" end def test_urls_with_parens assert_linked "(http://example.com)", "(http://example.com)" assert_linked "((http://example.com/()))", "((http://example.com/()))" assert_linked "[http://example.com/()]", "[http://example.com/()]" assert_linked "(http://example.com/)", "(http://example.com/)" assert_linked "【http://example.com/】", "【http://example.com/】" assert_linked "『http://example.com/』", "『http://example.com/』" assert_linked "「http://example.com/」", "「http://example.com/」" assert_linked "《http://example.com/》", "《http://example.com/》" assert_linked "〈http://example.com/〉", "〈http://example.com/〉" end def test_urls_with_quotes assert_linked "'http://example.com'", "'http://example.com'" assert_linked "\"http://example.com\"\"", "\"http://example.com\"\"" end def test_underscore_in_domain assert_linked "http://foo_bar.com", "http://foo_bar.com" end def test_underscore_in_subdomain assert_linked "http://foo_bar.xyz.com", "http://foo_bar.xyz.com" end def test_regression_84 assert_linked "https://www.keepright.atの情報をもとにエラー修正", "https://www.keepright.atの情報をもとにエラー修正" end end