stream.php -Suchmaschine braucht hilfe

Ra? · 2. August 2008, 15:52

Hallo Leute,

ich brauche hilfe ,hier sieht ihr den Code einer Datei von meiner Suchmaschine. Sie sucht mit curL die Streams und trägt sie in der Datenbank ein. Und ich wollte gerne statt 2 seiten mehre seiten eintragen.

Und onlinestreams und onlinemoviez wollte ich ganz weg haben.

Folgendes Seiten wollte ich eintragen:

Lustige Videos, Video Clips und Werbespots gibts bei uns - MyVideo
streamload.IN - the number #1 archiv
stream-collector.com
UPDATE: Link wegen zweifelhaften Inhalts entfernt
UPDATE: Link wegen zweifelhaften Inhalts entfernt

Quellcode

<?
//var_dump(stream::lookup("rambo"));
/*
var_dump(stream::onlinestreams_org("rambo"));
$url="http://www.onlinestreams.org/ajax/suche.php";
$post="suche="."rambo";
$ref="http://www.onlinestreams.org/index.php";
$osorg=stream::curl_post($url,$post,$ref);
preg_match_all("#<tr><td>(.*?)</td></tr></table>#ims", $osorg, $div);
echo $div[1][0]." ".$div[1][1];
die();
*/
class stream{
//searches db, if not found, scrapes and adds
function lookup($title){
$found=stream::search_title_db($title);
if (count($found) == 0 || $found == "" ){
//nothing in db, scrape it
$found=stream::scrape_sites($title);
if (count($found) == 0 || $found == "" ){return;}
for ($i=0;$i<count($found);$i++){
//watch out for other array structure as before!!!
//$obj=$found[$i][2];
$obj=str_replace('</center>','',$found[$i][2]);
stream::add_db($found[$i][0],strtolower($found[$i][1]), $obj ,$found[$i][3]);
}
}
return $found;
}
function scrape_sites($title){
$erg1=stream::onlinemoviezzz_blogspot_com($title);
$erg2=stream::onlinestreams_org($title);
// FROM
// array[0]=title, [1]=playerurl_object [2]=info
// TO
// url, title, obj, desc
$j=0;
for($i=0;$i<count($erg1);$i++){
$comp[$j][0]=stream::extract_link($erg1[$i][1]);
$comp[$j][1]=strtolower($erg1[$i][0]);
$comp[$j][2]="<object>".str_replace('<param name="wmode" value="transparent"></param>','',$erg1[$i][1]);
$comp[$j][3]=$erg1[$i][2];
$j++;
}
for($i=0;$i<count($erg2);$i++){
$comp[$j][0]=stream::extract_link($erg2[$i][1]);
$comp[$j][1]=strtolower($erg2[$i][0]);
$comp[$j][2]="<object>".str_replace('<param name="wmode" value="transparent"></param>','',$erg2[$i][1]);
$comp[$j][3]=$erg2[$i][2];
$j++;
}
//for additional scrapesites
/*
for($i=0;$i<count($erg3);$i++){
$comp[$j][0]=stream::extract_link($erg3[$i][1]);
$comp[$j][1]=strtolower($erg3[$i][0]);
$comp[$j][2]="<object>".str_replace('<param name="wmode" value="transparent"></param>','',$erg3[$i][1]);
$comp[$j][3]=$erg3[$i][2];
$j++;
}
*/
return $comp;
}
//adds to db
function add_db($url, $title, $object, $desc){
if ($url == '' || $title == '' || $object == '' ) {return;}
if (strlen($url) < 3 || strlen($title) < 3 || strlen($object) < 3 ) {return;}
$link=mysql_connect('localhost', '', '') or die("Could not connect to MYSQL host");
mysql_select_db('streamsearcher', $link) or die("Couldnt connect to databank.");
$result = mysql_query("INSERT INTO data VALUES ('".$url."','".$title."','".$object."','".$desc."',1)");
//if (!$result) { die("Couldnt send data to db: ". mysql_error());}
}
//searches db for title
function search_title_db($title){
$title=strip_tags($title);
$title=trim(strtolower($title));
$link=mysql_connect('localhost', '', '') or die("Could not connect to MYSQL host");
mysql_select_db('streamsearcher', $link) or die("Couldnt connect to databank.");
$result = mysql_query("SELECT DISTINCT * FROM data WHERE title LIKE '%".$title."%'");
if (!$result) { die("Couldnt send data to db: ". mysql_error());}
while ($i=mysql_fetch_array($result)){
$out[]=$i;
$i++;
}
return $out;
}
function titleonly_db(){
$link=mysql_connect('localhost', '', '') or die("Could not connect to MYSQL host");
mysql_select_db('streamsearcher', $link) or die("Couldnt connect to databank.");
$result = mysql_query("SELECT DISTINCT title FROM data");
if (!$result) { die("Couldnt send data to db: ". mysql_error());}
while ($i=mysql_fetch_array($result)){
$out[]=$i;
$i++;
}
return $out;
}
//returns array[0]=title, [1]=playerurl_object [2]=info
function onlinestreams_org($search){
$url="http://www.onlinestreams.org/ajax/suche.php";
$post="suche=".$search;
$ref="http://www.onlinestreams.org/index.php";
$osorg=stream::curl_post($url,$post,$ref);
preg_match_all("#playMovie$'filme',(.*?)$#ims", $osorg,$match);
$url="http://www.onlinestreams.org/ajax/movie_play.php";
$ref="http://www.onlinestreams.org/index.php";
preg_match_all("#<tr><td>(.*?)</td></tr></table>#ims", $osorg, $titles);
$found=array();
$ret=array();
for ($i=0;$i<count($match[1]);$i++){
$found[]=substr($match[1][$i], strpos($match[1][$i], ",") );
$post="id=".$found[$i];
$ergsite=stream::curl_post($url,$post,$ref);
preg_match_all("#<div (.*?)left: -40px\">#ims", $ergsite, $div);
for ($j=0;$j<count($div[1]);$j++){
$tdiv=$div[1][$j];
$tdiv= substr( $tdiv, strpos($tdiv, "<param name="));
$endp=strpos($tdiv, "</object>")+9;
$purl=substr($tdiv, 0, $endp);
$info=strip_tags(substr($tdiv, $endp));
if (substr_count($info, "Megavideo") > 0) {
$info=substr($info,236);
}
if (substr_count($info, "Veoh") > 0) {
$info=substr($info,213);
}
$kat=array("Alle gemischt","Abenteuer","Action","Action-Komödie",
"Anime","Drama","Fantasy","Horror","Kinderfilme","Kino","Kinofilme",
"Komödie","Konzerte","Kriegsfilme","Krimikomödie","Melodram","Romantik",
"Satire","Science Fiction","Special","Sport","Thriller","TVRiP's","XXX","Zeichentrick",
"kategorie:","Kategorie:");
for($p=0;$p<count($kat);$p++){
$info=str_replace($kat[$p],"",$info);
}
//echo $purl."<br>\n".$info."<br><br>\n\n";
$ret[$i][0]=$titles[1][$i];
$ret[$i][1]=$purl;
$ret[$i][2]=$info;
}
}
return $ret;
}
//returns array[0]=title, [1]=playerurl_object [2]=info
function onlinemoviezzz_blogspot_com($search){
$url="http://onlinemoviezzz.blogspot.com/search?q=".$search;
$ref="http://onlinemoviezzz.blogspot.com";
$omz=stream::curl_get($url,$ref);
$ret=array();
preg_match_all("#Permanent Link to (.*?)'#ims", $omz,$titles);
preg_match_all("#<embed(.*?)/>#ims", $omz,$match);
$m1count=count($match[1]);
for ($i=0;$i<$m1count;$i++){
$m1="<embed".$match[1][$i];
$endp=strrpos($m1,">");
$m1=substr($m1,0,$endp+1);
$thistitle= $titles[1][$i];
$thistitle=str_replace("(","",$thistitle);
$thistitle=str_replace(")","",$thistitle);
$thistitle=str_replace("Veoh","",$thistitle);
$thistitle=str_replace("(veoh)","",$thistitle);
$thistitle=str_replace("Megavideo","",$thistitle);
$thistitle=str_replace("megavideo","",$thistitle);
$ret[$i][0]=$thistitle;
$ret[$i][1]=$m1;
}
preg_match_all("#<div style=\"text-align: center; (.*?)</div>#ims", $omz,$match2);
$m2count=count($match2[1]);
if ($m2count != $m1count)
{
preg_match_all("#Beschreibung:(.*?)</div>#ims", $omz,$match2);
$m2count=count($match2[1]);
}
for ($i=0;$i<$m2count;$i++){
$m2=$match2[1][$i];
$m2=preg_replace('#<center>(.*?)<\/center>#ims','', $m2);
$m2=str_replace("color: rgb(0, 0, 0);\">","",$m2);
$m2=strip_tags($m2);
$ret[$i][2]=trim($m2);
//echo $m2;
}
return $ret;
}
function extract_link($obj){
preg_match_all("#src=\"(.*?)\" #ims", $obj, $erg);
return $erg[1][0];
}
function curl_get($url,$ref="")
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
function curl_post($url, $params, $ref="") {
$handle = fopen( "cookiejar.txt", "w+" );
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookiejar.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookiejar.txt');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
$output = curl_exec($ch);
curl_close($ch);
@fclose($handle);
return $output;
}
}
?>

Alles anzeigen

Torben Brodt · 2. August 2008, 16:05

wo ist denn konkret das Problem? Jede Seite hat ja ihr eigenes Regelwerk um die Items aus dem Quelltext zu extrahieren.
Das musst du für deine neuen Seiten anlegen.

Generell würde ich das ganze Klassenkonzept überdenken. Dein Konstrukt ist nicht erweiterbar.
Am besten du definierst ein Interface mit den Standardmethoden eines Crawlers und implementierst das dann.

Quellcode

$this->crawler = array();
function addCrawler(Crawler $c) {
$this->crawler[] = $c;
}
function scrape_sites() {
$return = array();
forach($this->crawler as $c) {
$return[] = $c->crawl();
}
return $return;
}

Alles anzeigen

Desweiteren solltest du die Methoden nicht statisch nutzen.

Ra? · 2. August 2008, 16:07

Hallo,

ich verstehe kein einziges Wort . Nen kollege hat mir die suchmaschine gecodet.
UPDATE: Link wegen zweifelhaften Inhalts entfernt

Kannst mir das erklären im vereinfachten deutsch?
Was muss ich machen?

Hast du ICQ ?

Grüße

Torben Brodt · 2. August 2008, 16:12

naja, wenn ich es dir ohne "Coding-Deutsch" erkläre, hilft dir das bei der Umsetzung auch nicht weiter.
Frag doch einen Kollegen, ob er mit meiner Antwort was anfangen kann. Ansonsten mach einen Auftrag draus und ich verschieb es ins Jobforum.

Ra? · 2. August 2008, 16:16

Hier kannst du das nicht so umsetzen?
Kannst mir per pn deine icq nummer geben ,dann schick ich dir das script xD

Verschiebs ruhig im Jobforum.
Suche seid Wochen einen coder der mir hilft. Das und noch was kleines zu bearbeiten. Das wars wirklich

Torben Brodt · 2. August 2008, 16:22

Ich nehme zur zeit keine Neuaufträge. Dennoch viel Erfolg. Habe den Thread verschoben

Ra? · 2. August 2008, 16:23

Kannst du mir sagen wo ich dein Coden einfügen soll und kannst mir es so normal erklären?
Wollte es versuchen!!

stream.php -Suchmaschine braucht hilfe

stream.php -Suchmaschine braucht hilfe

Quellcode

Quellcode

Teilen

Benutzer online 1

Tags