Tuesday, July 26, 2011

How to Create a Web-Based Video Converter

My wife teaches cooking classes near Boston and blogs about food. Recently, she's become interested in video. She's already made a few videos using Windows Movie Maker (WMM), like this one on how to dice an onion. But, she's finding that free movie editing software just doesn't cut it. The kicker was when she published a video, learned that the volume was too low and couldn't find any way to raise the volume in WMM---it only allows limited volume adjustment.

She had already started researching video software options and settled on Sony Vegas. Yes, people complain that it crashes and/or runs slowly, but those same people are trying to process relatively large movies. My wife is looking to publish 5 minute non-HD videos on YouTube. So far, Sony Vegas has worked well for her. The one problem she's encountered is that Sony Vegas can't import Flip Video AVI files. She found a free Windows-based converter. But, it leaves a blatant watermark and screwed-up a half-second of the audio track.

I figured this would be an easy problem to solve with Linux software. Sure enough, a bit of searching and I discovered ffmpeg, which is available in Debian. After a few minutes of man-page reading, I had a command-line to perform the conversion:

ffmpeg -i myvideo.avi -target ntsc-vcd myvideo.mpg
Note that this generates NTSC video. If you're in Europe, you might want PAL, which you'd get by changing the target to pal-vcd.

But, this wouldn't cut it. My wife didn't want to have to copy to Linux, convert, then copy back or wait for me to get home just so she could start editing a video. So, I set to work on creating a web-based converter. Creating a script to upload the video is straightforward and easy to find. Here's an example of file upload HTML and PHP. But, what this page doesn't tell you about are the hard limits PHP has on file and memory sizes. Here is a PHP "bug" which describes the max file size ("exceeds the limit of 8388608 bytes") problem I quickly encountered. Sniper provides the config settings that need to be edited:

post_max_size = 256M
upload_max_filesize = 256M
memory_limit = 256M
I modified these settings in /etc/php5/apache2/php.ini, restarted my web server, and then was able to upload video files larger than 8 megs. The final question was how to push the converted video back to my wife's web browser. For some reason, all the pages I found on how to upload a file didn't mention anything about the possibility of pushing binary data back to the user. Finally, I stumbled upon the PHP readfile function. Occasionally, PHP is nice in that it provides tools and examples for what you probably want to do, like push an entire file to a user's web browser. The readfile manual page provides a full example for how to do this, including the necessary HTTP headers and proper output buffer management.

Here's what I ended-up with. Note that this script is unsafe due to the fact that it executes a shell command. Also, it relies on /tmp being the usual "temp" directory and can fail if a file already exists with the name $outfile. But, it serves it's purpose for me.

<?php
if ($_FILES["video"]["name"]) {
  $pattern = '/(.+)\.avi$/i';
  $replacement = '${1}.mpg';
  $outfile = preg_replace($pattern, $replacement, basename($_FILES["video"]["name"]));
  $cmd = "ffmpeg -i " . $_FILES["video"]["tmp_name"] . " -target ntsc-vcd " . $outfile;
  chdir("/tmp");
  shell_exec($cmd);
  header('Content-Type: application/octet-stream');
  header('Content-Disposition: attachment; filename=' . $outfile);
  header('Content-Transfer-Encoding: binary');
  header('Expires: 0');
  header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
  header('Pragma: public');
  header('Content-Length: ' . filesize($outfile));
  ob_clean();
  flush();
  readfile($outfile);
  unlink($outfile);
  exit;
} else {
?>
<html>
<body>
<form enctype="multipart/form-data" action="video.php" method="POST">
Choose a file to upload: <input name="video" type="file" /><br />
<input type="submit" value="Upload Video" />
</form>
</body>
<?php
}
?>

Tuesday, July 19, 2011

How to Create a GMail Filter Based on Reply-To

Today, I found myself trying to filter a message in Gmail based on the "Reply-to:" header. "Reply-to:" was the only user-friendly header that clearly distinguished it from other types of messages I receive. I knew that Gmail allows list-based filtering using a "list:" prefix in the "Has the words:" field. But, I haven't found a resource to tell you what, if any, other prefixes are allowed. I tried "reply-to:" without success. Then, I searched and found another poor soul with the same dilema. A bit more futzing and I discovered that "replyto:" was the correct prefix. Unfortunately, the Google help forum won't let me post a reply (!) So, I am hoping that romadatnvwisp will read this post and learn how easy it is to filter based on the "Reply-to:" header. Maybe he's figured it out himself?

To recap... lets say you want to filter messages with a Reply-to: address of "me@sample.com". Here's how you do it:

  1. Click the gear icon in the upper-right-hand-corner of Gmail, then select "Mail settings"
  2. Click the "filters" tab, scroll to the bottom and click "Create a new filter"
  3. Type "replyto:me@sample.com" in the "Has the words" field
  4. Click "Next Step" and finish creating the filter; note that matching messages should appear below after clicking "Next Step"

Sunday, July 10, 2011

How to Add Share Buttons in Blogger

Google has redesigned the Blogger administration interface.  As a result, the instructions posted by Blogger a month ago on how to add the "+1" button are no longer valid.  "Design >  Page Elements" no longer exists.  Instead, you need to click "Layout" then click the tiny "Edit" link at the bottom right-hand corner of the "Blog Posts" box.  This will give you a list of post option.  Make sure the "Show Share Buttons" checkbox is checked, then scroll to the bottom and hit "Save".

Didn't work?  That's expected if your blogger has existed for a while (more than 1 year?).  Fortunately, this Blogger Templates post on how to add share buttons describes how to hack it (scroll down).  This trick worked nicely for all but one of the blogs I administer.  Unfortunately, for my wife's Beyond Salmon blog which was created six years ago, it only sort-of worked---the buttons came out very small and the "+1" button didn't appear (even after I moused over the other share buttons, which was the only way "+1" would appear on some other blogs).  Bleh.

Friday, July 1, 2011

I Want My Old Google Calendar Back!

My wife woke up this morning to discover that her Google Calendar had been transformed via Google's effort to evolve the Google design and experience.  I fail to see how this effort is a good idea.  The result is that my wife can see fewer events on her "Month" calendar (which is the primary view she uses).  She used to be able to see six events per day.  Now she can see four and she has to click the "more" link to see all events on a day with 5 or 6 events.  It seems that this is due to wasted space.  There is more padding/spacing around everything, including day-of-the-week names and date numbers.  What's the point of spreading-out everything in the display if it distances you from useful information without improving usability?  I'm at a loss.  Fortunately, it's easy to revert to the old UI---click the gear icon in the upper-right-hand-corner, then select "Use the classic look".

Disclosure: I joined Google via the ITA Software acquisition and have no involvement in the above-discussed UI redesign.

Update 7/4/11: Apparently, I'm not the only one who thinks the Google Calendar redesign is a bad idea.

Tuesday, June 7, 2011

Using Multiple Google Accounts in Chrome

Now that I'm an employee of Google, I have two Google accounts, corporate and personal. Google has made it relatively easy to manage two accounts with "account switching". If I'm in Gmail, I can easily switch between accounts by clicking on my email address in the upper-right-hand corner of the screen, then selecting "switch account" followed by the account I want. But, the choice of account isn't completely persistent. Occasionally, I find that when Gmail and other Google applications switch users without any input from me. This seems to be especially common when I return to the computer after an extended break (e.g. overnight). One of the google applications will ask me to re-authenticate and this will screw-up the choice of account in other tabs. Also, it occasionally switches accounts on me when I click a link.

One obvious solution is two use two different browsers, e.g. Chrome for corporate and Firefox for personal. But, I feel that this is worse than the occasional random account switch that I get by using a single Chrome instance due to differences in the UI. A better solution seems to be separate profiles. Chrome allows you to specify the location for your user data. So, you can run two instances of Chrome, one with the default location and one with a custom location:

$ mkdir ~/.config/personal
$ google-chrome --user-data-dir=~/.config/personal
If you then use one account exclusively with each instance, there should never be any switching issues.

Tuesday, April 5, 2011

Custom Search in Chrome

At home, I have little use for domain-specific search engines---Google usually gives me the right answer. But, at work, Google doesn't have access to much of the information I need. Yes, we have an intranet search, but it is of little-to-no help when I'm looking for a particular bug, ticket or code revision. Firefox has "bookmarklets" which let you trigger a custom search based on a keyword. Though it's less obvious, Chrome lets you do the same thing via its "search engine" configuration. In the "Manage Search Engines" window, enter a keyword and a url and a "%s" for where the search string should be substituted. Then, finding a particular bug can be as easy as typing "b 12345" in the omnibox. I got this tip from Lifehacker.

Friday, April 1, 2011

Nohup or Disown?

In the past when I've started long-running jobs, I've used nohup. But, what if you forget to use nohup or find that a job is taking longer than expected? Today, I learned about disown which allows you to easily prevent a process from being killed when the parent is killed. Ksplice provides a detailed tutorial on using disown. The quick-and-dirty version is:

$ long-running-process.sh
^Z
[1]+  Stopped     sh long-running-process.sh
$ bg
$ disown %1

Friday, March 25, 2011

XChat Configuration

I'm an IRC newbie even though I've been using email for 15+ years. IRC seems great for those issues that benefit from the experience of many but aren't large or important enough to warrant hitting the mailboxes of 100 people. It also seems useful for determining the magnitude of an issue. Anyway, now that I'm using IRC, I wanted to make my client of choice, XChat, less annoying to use:

  • XChat highlights a channel when any new messages show up, including join/quit. So, as long as join/quit messages are displayed, the channel highlight wasn't very informative. I learned that it's easy to disable join/quit messages. Note that in in XChat 2.8.6, the "hide" option is under "Settings".
  • By default, XChat starts with no server connection and no channels. You can get it automatically connect and open channels via the "XChat"/"Network List..." menu. "Edit..." your server, check "Auto connect to this server at startup" and click the button next to "Favorite channels" to provide a list of auto-connect channels.

Tuesday, March 22, 2011

Deleting Old Files

Deleting old files should be an easy task, right? find can locate them and rm can delete them. But, what's the right combination? xargs must be useful as it converts newline-separated data into space separated data, but the first thing I tried didn't work because there were spaces in some of my file names:

$ find -mtime +90 | xargs ls
ls: cannot access ./download: No such file or directory
...
The xargs man page provides one solution:
...filenames containing blanks and/or new‐lines are incorrectly processed by xargs. In these situations it is better to use the -0 option, which prevents such problems. When using his option you will need to ensure that the program which produces the input for xargs also uses a null character as a separator. If that program is GNU find for example, the -print0 option does this for you.
Another option is to tell xargs to use newline as the delimiter. 'course, I'd recommend that you first try listing the files to see what you'll be deleting. Hence the ls in these examples. You should change to rm when you're ready to destroy.
$ find -mtime +90 -print0 | xargs -0 ls
$ find -mtime +90 | xargs -d "\n" ls

Thursday, March 10, 2011

Google Chrome 10: Flash Out of Date

If you're like me and just upgraded to Google Chrome 10 (10.0.648.127), you may find that you get Flash is "Out of Date" error messages. IIUC, Chrome 9 used it's own build-in flash player, but Chrome 10 has switched back to using the Adobe one. Also IIUC, Chrome 10 expects Flash 10 and will complain if the available flash is 9 or older. At first, I thought I was out-of-luck since this Adobe TechNote says that Chrome's flash is built-in, but I found that after uninstalling (Adobe) flash, not even the "Run this time" option would work. So, I tried reinstalling Adobe flash. It grabbed version 10 of Adobe flash, and, Poof! Flash was working again without the warning message. This even though I'm on Debian oldstable (5.0 aka lenny). So, if you run into this issue, I'd recommend installing Adobe flash 10. It appears that on Debian-like distributions (e.g. Ubuntu), this is as easy as:

$ sudo apt-get remove libflashplugin-nonfree
$ sudo apt-get install libflashplugin-nonfree

Update (3/24/11): Users on the Google Help Forum have suggested checking chrome://plugins to see if Chrome is using the Firefox's outdated plugin (/home/username/.mozilla/plugins/). If so, try deleting the outdated plugin and restarting Chrome.

Wednesday, February 23, 2011

Reloading the Mutt Configuration File

I find it a bit annoying that I have to re-enter my password twice every time I (re-)start mutt when I want to update my configuration file. Well, it appears I don't need to be doing restarting mutt. The MuttWiki describes a way to re-load the configuration after mutt is running. Type the following into mutt after you've changed your muttrc file:

:source /path/to/your/muttrc
Note that shell expansions may not work---you should spell out the full path.

Monday, February 7, 2011

Too many open files

Certain applications, such as a web server with lots of database connections, require a large number of open files. Most Linux systems are, by default, configured to allow relatively small number of open files, e.g. 1024. How to change this limit isn't as obvious as one might hope.

ulimit will show you current limits and let you change limits for the current session. But, one rarely cares about a temporary change. For a permanent change, one must realize that these limits are in place for security purposes---so that it is difficult for a single user to bring down the entire machine. So, the limits are configured in /etc/security/limits.conf. Adding the following lines to /etc/security/limits.conf should help if you are having "too many open files" troubles:

* soft nofile 16384
* hard nofile 65536
Note that this is also the place to "unlimit" the number of processes a user can run, e.g.:
* soft nproc 4096
* hard nproc 16384
Note that a "soft" limit is the limit a user will get when starting a shell. The "hard" limit is the highest limit they can set without "root" privileges.

Note that when checking limits using ulimit, soft limits are shown by default. Use the -H option to get hard limits. The -a option shows "all" limits. So, run the following two commands to see soft, then hard limits, respectively:

ulimit -a
ulimit -a -H

Tuesday, January 25, 2011

Indenting python code

Since python determines scoping by indentation, it's imperative to be able to indent and de-indent blocks of code. I just learned how to do this in emacs: M-x python-shift-right or M-x python-shift-left. Note that you have to be in python-mode for these to work. Here's a discussion of all the possible ways to do it. A good, non-python-specific solution is C-x TAB which calls indent-rigidly, but you need to give it an argument to indent more than a single space.

Sunday, January 16, 2011

Writing udev rules

One annoying thing about having a USB-connected Vantage Pro2 weather station is that, occasionally, there is sufficient noise in the line that Linux drops the connection and immediately reconnects. The immediate reconnection is good. What's annoying is that Linux usually assigns it a different device name upon reconnection. So, if I configure my weather station software, weewx with the ttyUSB0 device name, I have to change the configured name after every reconnect (every few months) and risk losing data (though considering that the VP2 data logger keeps up to 2560 records, or 8+ days with a 5 minute interval, it's a small risk).

The solution to this problem is to use a custom udev rule. Udev allows you to do a variety of things with the /dev directory, including providing a single, consistent name for each USB device you connect (no matter what /dev name it gets assigned by the kernel). The following rule creates a symbolic link from /dev/vpro to the current device name for my VP2:

ACTION=="add", ATTRS{interface}=="CP2102 USB to UART Bridge Controller", SYMLINK+="vpro"
This says to add the "vpro" link whenever a device with a matching "interface" attribute is "add"ed (i.e. connected/attached). What's a little tricky about adding a udev rule is figuring out what attribute to match against so that the rule triggers on the device you want (and only that device). For that, you need to know what device name the kernel assigns (what /dev entry),

To find out what to match against, you need to get the information about the device. The following command provides that. You will need to replace /dev/ttyUSB0 with the device name that you are interested in.

udevadm info --attribute-walk --path $(udevadm info --query=path --name=/dev/ttyUSB0) 
The command will yield a hierarchy of information about the device. Information at the beginning is probably too minimal to be useful for matching; information at the bottom is probably too general and will match too many different devices. Look for something towards the top which sounds like it might describe the device. Use the key and value to write a rule like the one I gave above. Put this rule in a file named like 66-vpro.rules in
/etc/udev/rules.d
Note that the number must be two-digit and the extension must be rules. Make sure that the file is world-readable.

To test your rule, restart udev:

sudo /etc/init.d/udev restart
then unplug your device of interest, wait a second, and plug it back in. You should be able to see the effect of your rule. In my case, udev creates a /dev/vpro symlink to the kernel dev entry (e.g. /dev/ttyUSB0 for my VP2).

Wednesday, January 12, 2011

Parallel Programming

My first introduction to parallel programming was the summer of 1994 at the Pennsylvania Governor's School for the Sciences. For my final project, we wrote Fortran77 code with an MPI library to parallelize neural network code for image recognition on a Cray T3D supercomputer with 256 processors. It was cute & fun, but neither terribly practical nor useful :-)

Four years ago, I had my second parallel programming experience, writing a recommendation engine for StyleFeeder. This work was in Java utilizing threads to take advantage of multiple cores on a single machine. I got to see performance per core degrade significantly as the number of cores increased and the joy of the Java garbage collector (which would lock-up the entire program for minutes at a time). I've heard that Java GC performance continues to improve...

My most recent experience with parallel programming could be considered the most primitive one (single-threaded python code). But, considering the difficulty of parallel programming, I'm not convinced that it's any worse than the other frameworks. We use twisted to simplify communication between processes and since twisted is an event loop (single thread), programming is greatly simplified---no need to worry about your code being interrupted anywhere.

A colleague of mine just discovered Is Parallel Programming Hard, And, If So, What Can You Do About It? by Paul E. McKenney. The table of contents piques my interest. I hope I'll find time to read it...