Loading...

Michel Rodriguez

Freelance developer: Perl, XML, Web

About Michel Rodriguez

I am a freelance developer working on a range of projects.

My main area of expertize is XML, and technical publishing in general. I have served for a long time as the resident XML expert for a standard organisation.

Beyond this I also develop a range of project often dealing with data. The data is usually either difficult to obtain (Web Scraping), heterogenous (Document Management System) or incomplete (Machine Learning) and requires effort and creativity to be turned into a useful output.

I am the author of XML::Twig, one of the most popular modules for processing XML, and of a handful of other modules mostly XML/HTML related that can be found in my (meta)CPAN directory.

“Data is a precious thing and will last longer than the systems themselves.”

Tim Berners-Lee

The types of services I offer :

  • XML related: from document modelization to conversion to XML, XML QC, integrating XML data with other sources or clients, to general XML processing. Very good knowledge of XML for standards (STS, JATS) but also of MathML, more than 20 years of experience in the field of XML for techpub.
  • Data cleanup and improvement: converting various formats to XML, PDF, JSON, CSV or loading them into data bases, adding structure to data (XML or other), dealing with character encodings, improving incomplete data using machine learning.
  • Web micro-services: building REST APIs on top of various services, like machine learning or exposing XML repositories.
  • SaaS: complete modern web systems, mobile-ready, using Bootstrap, hosted in the cloud if appropriate.
  • Web Scraping: collecting and structuring publicly accessible data from websites, including dynamic websites.

Links


Recent Projects

  • SaaS: management tool for the network of external experts for a consulting firm, including full-text search of common types of documents, email alerts...
    stack: linux, apache + starman, perl, dancer2, DBIx::Class, Text::XSlate, postgresql, bootstrap, jquery, postmark
    challenges: deal with a range of document formats, manage automatic email interactions with users, manage all the aspects of SaaS
  • XML Quality Checking Tool: allows automatic checking of standards for a range of rules: numbering, structure restrictions, style rules... the tool allows the creation of custom rules, visualization of error context in the original XML or in HTML or PDF created from the XML, a wiki for teams to share know-how about common errors... Its current targets are the STS and the S1000D but other formats can be added.
    stack: linux, apache + starman, perl, dancer2, SQLite, XML::Twig
    challenges: allow creation of new rules in an easy way, report problems in a helpful way
  • Request Tracker (RT) participating in the development of RT 5. New features, "coring" existing extensions, integration with external services, custom development for clients, support.
    stack: perl, mason, DBIx::SearchBuilder
    challenge: dealing with an existing complex software, deliver clean code, understand customers requirements and implement them within the RT framework.
  • Machine Learning: categorization of tender notices, assigning a CPV (Common Procurement Vocabulary) code based on their title. Implemented as a micro service returning results in JSON
    stack: linux, starman, perl, TensorFlow, grocery, Lingua::Stem, dancer2
    challenge: deal with the quality of the training data, monitor and adjust the behaviour of the system
  • Web Scraping: scraping of a range of public websites for tender information, clean up the data, normalize it and output it as XML to feed a search engine
    stack: linux, perl, phantomJS/CasperJS, puppeteer, SQLite, XML::Twig
    challenges: deal with a variety of website technologies, add structure to the data, monitor activity to detect changes in websites structure and aadjust processing
  • mif2mml: tool converting FrameMaker's MIF equations into MathML (mif2mml on github).
    stack: perl, Parse::RecDescent, XML::Twig
    challenges: cover the entire MIF specification, based on Adobe's docs, deal with quirks in the way FrameMaker generates equations.
  • Open Source: maintenance of the XML::Twig Perl module, widely used to process XML data in Perl
    challenges: make sure the module installs and passes the tests on a wide variety of configuration, help users and keep it relevant.
    I am somewhat active on StackOverflow, under the moniker mirod, usually answering questions about XML::Twig.
  • Standards Dictionary: on-line tool for querying a dictionary built from the definition section of all IEEE Standards.
    stack: linux, apache + starman, perl, dancer2, SQLite
    challenges: gather and QC the XML data, and interface the system with the standards management tools.
  • XML expert for the IEEE Standards Department: adaptation of the STS (Standards Tag Suite) DTD to the IEEE standards
    challenge: making sure all of the necessary data structure was captured by the STS/JATS based DTD used by the internal client, and that the XML could be used to generate standards with the same quality as before using an XML toolchain

Clients


More

Modern Perl is a set of modern conventions, tools and methods that effective Perl programmers use to write powerful, maintainable, scalable and concise code. It relies on CPAN, a distributed repository of thousands of Open-Source modules. Modern Perl favors OO, with a powerful ORM on top of the data base. The Perl culture is also very focused on testing and offers frameworks and numerous tools for building and maintaining tests.

I have been part of the Perl community for quite a while now, and I have followed the evolution of the language and its culture

I can bring you clear, documented, tested code, that uses as much as possible existing libraries to limit the amount of new code that needs to be written. This allows me to be extremely efficient and to deliver results in a timely fashion.

“Easy things should be easy, and hard things should be possible.”

Larry Wall
Years of hands-on experience with XML in the Tech Pub industry have taught me (often the hard way!) what is and isn't possible to do with the current technology, while keeping an eye on what will be possible in the future.

I can help you with the whole life-cycle of your documents: help you design your DTDs, go through the data to check that it can be modeled properly, help with the conversion process, advise on choosing or designing an editing system and help you use the XML data to create new products.

“My definition of an expert in any field is a person who knows enough about what's really going on to be scared.”

P.J. Plauger
RT is an open source ticketing system developed by Best Practical Solutions.

It helps you manage your customer interactions by storing all the exchanges both with the customer and within the support team in one place. It is also allows extremely customizable, to fit your process and integrate with your other systems.

From installation to developing custom extension, I can help you with all aspects of RT deployment.

I have a very good knowledge of the software, what it can do out of the box and how to get it to do what it can not.

“If it’s not in the ticket, it didn’t happen.”

unknown

Machine learning is an exciting field that promises a lot. Can it deliver for you?

I have a good amount of experience with short-text classification, which often helps add missing information to data, based on the text that's available. It can be a simple and (relatively) easy way to improve the quality of the data, and deals fairly well with data sets that are noisy and of less than ideal quality.

“When you’re fundraising, it’s AI / When you’re hiring, it’s ML / When you’re implementing, it’s linear regression / When you’re debugging, it’s printf()”

Wrangling data out of modern web sites can be difficult, but there are quite a few technologies that can help with it.
From extracting data from web sites ("web scraping"), to custom browser extensions, I have long experience in automating processes to get data into a company's systems.

“It’s automation, not automagic.”

Jim Hazen
Every organization needs tools: tools that help fixing common problems with the data, that add structure to less than ideally structured data, tools that get data into the system in a safe way, tools that allow delivery of derived data from a main XML repository to a specific customer... tools that help with all the behind-the-scenes processes that any company needs to function efficiently.
My most successful tool is XML::Twig, a Perl module (library) that processes XML documents of all sizes in a convenient and efficient way. It has been used for nearly 20 years by developers all over the World to process XML. You can get an idea of what it is used for on Stack Overflow.

“The expectations of life depend upon diligence; the mechanic that would perfect his work must first sharpen his tools.”

Confucius
Open-Source is great. I tend to work mostly with open-source software: from Operating System (linux) to Programing Language (Perl, javascript), DBMS (mostly SQLite and Postgresql) and machine learning libraries (tensorflow), pretty much any type of software I need is available.
I try to contribute back to Open-Source projects, mostly in Perl, either through releasing my own Perl modules on CPAN, or through patches (and tests!) when I find bugs that I am able to fix in other modules.

“I think, fundamentally, open source does tend to be more stable software. It's the right way to do things.”

Linus Torvalds

Contact

Call
+39 348 935 0910
Address
Loc. Capraia 1, Fraz. Fondagno, 55064 Pescaglia, LU - Italy