Pidgin html log parser

I wish to put my chat logs in a database, then I can do a lot of fun operation on it. I wrote this php script so read Pidgin's html logs. I will write something separate to put it in a database.
I use html instead of text log so there is no ambiguous expressions (for example, someone copied a chat log and sent to you), and provided all information(which nickname is the user? one that's colored blue.).

To achieve what I really want, it's much better to wrote a plugin for Pidgin. I heard there was a Remote Logging plugin(which is exactly what I needed) but never saw anything come out from it. I hope there is someone who want to revive the project.

Some notes:
1. Pidgin can change the format anytime and render this not usable.
2. This is only useful in 1 on 1 IM's. in conversations, it can only record if you said something vs someone else said something.
3. the log will not be parsed if it's not complete(which means it didn't end in html end tag)

//this script reads a pidgin log html file.
//Configuration!!!!
//Logs to check
//The directory to the log file, without the last slash
$f = 'C:\Documents and Settings\UserXP\Application Data\.purple\logs';
//$s is an array of service name
//$u is an array of usernames
 
$s[]  = 'aim';
$u[] = 'mgcclx';
 
//use html or not
$html = 1;
//if use html, which html tags are allowed
$html_allow = '<br/><span><font><p><a>';
 
 
//Write your own logging function
//For every message, it call this function once
function logging_function($service,$user,$other,$user_or_other,$time,$speaker,$content){
	return true;
}
 
for($i=count($s)-1;$i>-1;$i--){
	$o = scandir($f.'/'.$s[$i].'/'.$u[$i]);
	for($j=count($o)-1;$j>1;$j--){
		$d[$j] = $f.'/'.$s[$i].'/'.$u[$i].'/'.$o[$j].'/';
		$files = file_list($d[$j],'html');
		for($k=0;$k<count($files);$k++){
			$log = parse_log($d[$j].$files[$k],$html,$html_allow);
			if($log===FALSE){
				continue;
			}
			for($l=0;$l<count($log);$l++){
				logging_function($s[$i],$u[$i],$o[$j],$log[$l][0],$log[$l][1],$log[$l][2],$log[$l][3]);
			}
		}
	}
}
 
function parse_log($file_name,$html=0,$html_allow = '<br/><span><font><p><a>'){
	$line = file($file_name);
	$c = count($line);
	if(rtrim($line[$c-1])!='</body></html>'){
		return FALSE;
	}
	preg_match("@Conversation with (.*?) at (.*?) (.*?) on (.*?) \((.*?)\)@u", $line[0], $match);
	$date=$match[2];
	$prev = 'AM';
 
	for($i=1;$i<$c;$i++){
		if(preg_match('@<font color="#(.*?)"><font size="2">\((.*?)\)</font> <b>(.*?):</b></font> (.*)<br/>@u', $line[$i], $match)==1){
			if($match[1]=="16569E"){
				$match[1]=1;
			}else{
				$match[1]=0;
			}
			if(substr($match[2],-2)=='AM'&&$prev=='PM'){
				$t = explode('/',$date);
				$date = gmdate("n/j/Y", gmmktime(0,0,0,$t[0],$t[1],$t[2])+86401);
			}
			$prev = substr($match[2],-2);
			$match[2] = $date.' '.$match[2];
			if(strpos($match[3],' &lt;AUTO-REPLY&gt;')!== FALSE){
				$match[3] = str_replace(' &lt;AUTO-REPLY&gt;','',$match[3]);
				$match[4] = '&lt;AUTO-REPLY&gt; '.$match[4];
			}
			if($html){
				$match[4] = strip_tags($match[4],'<br/><span><font><p><a>');
			}else{
				$match[4] = str_replace('<br/>',"\n",$match[4]);
				$match[4] = strip_tags($match[4]);
			}
			$log[] = array($match[1],$match[2],$match[3],$match[4]);
		}
	}
	return $log;
}
 
//this function found on http://us3.php.net/manual/en/function.scandir.php
//by phpdotnet at lavavortex dot com
function file_list($d,$x){
	foreach(array_diff(scandir($d),array('.','..')) as $f)if(is_file($d.'/'.$f)&&(($x)?ereg($x.'$',$f):1))$l[]=$f;
	return $l;
}

Comments

Anonymous's picture

nice work

nice work

Post new comment

  • Allowed HTML tags: <img> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <span> <fn>
  • Lines and paragraphs break automatically.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].
  • Mathematical equations and graphs can be added between [tex] and [/tex], [graph] and [/graph] tags.
  • Textual smileys will be replaced with graphical ones.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
* (minus seven) = -77
Solve this math question and enter the solution with digits. E.g. for "two plus four = ?" enter "6".
Honey Pot that kill bots