The logic and future of Music scrapers?

February 7, 2017, 7:45 am

≫ Next: [split] The logic of Music scrapers?

≪ Previous: How do series/show episode scrapers work?

i'm currently working on python based scrapers for artists and albums.
since i'm pretty clueless on the scraping process, i'm hoping to get some feedback from more experienced devs.

so far, i came across two things i don't quite understand:

1) if the 'prefer online info' setting is disabled, we pass the artistname to the artist scraper.
if the setting is enabled, we pass the artist mbid to the scraper.

why don't we always pass the mbid (if available) regardless of this setting?

ref: https://github.com/xbmc/xbmc/blob/99c25f...#L220-L223
ref: https://github.com/xbmc/xbmc/blob/99c25f...#L569-L583

2) if the album scraper returns no results, we completely skip the artist scraper. why?

ref: https://github.com/xbmc/xbmc/blob/99c25f...#L843-L883

3) if the 'prefer online info' setting is enabled, and 'show song and album artists' is enabled:
this causes the same artist being listed twice in your library if the artistname in your tags does not 100% match the artistname the scraper returns.

for instance "The B-52's" vs. "The B-52s":
3.1) i have all songs of an album tagged with artist "The B-52's"
3.2) we start the album scanner and it returns the mbid for this artist
3.3) we pass this mbid to the artist scraper and it returns info for "The B-52s" and kodi adds it to the db.
3.4) kodi now scans all songs for 'additional' artists. it finds "The B-52's" and checks if it's already in the db... nope
3.5) we pass "The B-52's" to the artist scraper and it returns info for whatever closest match it can find and kodi adds this artist to the db

ref: https://github.com/xbmc/xbmc/blob/99c25f...#L843-L883

↧

[split] The logic of Music scrapers?

February 7, 2017, 10:11 pm

≫ Next: Javascript scrapers?

≪ Previous: The logic and future of Music scrapers?

ronie not sure about your problem. and you prob know this but

id3 tags are not saved in ASF, MP4 or WAV based files.

if i missed what you where asking well thats me always shoot and miss Big Grin

↧

Javascript scrapers?

February 17, 2017, 3:02 am

≫ Next: Skip CreateSearchUrl?

≪ Previous: [split] The logic of Music scrapers?

(2017-02-07 17:45)ronie Wrote: i'm currently working on python based scrapers for artists and albums.

Sorry if this is the wrong place to ask, as I know that Kodi have already had an integrated Python interpreter for many years now.

But can I ask if there is any plans for Kodi to also get native support for scrapers written in JavaScript?

I see that Razze made a "Kodi addon generator" which enables you to use Node.js would could allow to use JavaScript in addons.

http://forum.kodi.tv/showthread.php?tid=305664

It would however be more convenient if Kodi had an integrated (embedded) JavaScript engine for scrapers, plugins, and addons.

Example; V8 (JavaScript engine) from Chromium Project, though it might be overkill? Or Cesanta "V7" embedded JavaScript engine

https://en.wikipedia.org/wiki/V8_(JavaScript_engine)
https://developers.google.com/v8/

Cesanta "V7" embedded JavaScript engine for C/C++, which claims to be the worlds smallest JavaScript engine written in C

https://docs.cesanta.com/v7/master/
https://github.com/cesanta/v7

V8 integration might be better if Kodi still have plans to also integrating CEF (Chromium Embedded Framework) web server?

https://www.phoronix.com/scan.php?page=n...e-Embedded

↧

Skip CreateSearchUrl?

March 3, 2017, 7:00 pm

≫ Next: Movie information scraper on all my Kodi

≪ Previous: Javascript scrapers?

I'm on my first attempt to write a very simple scraper for an adult site but have run into a problem..

Conveniently the files you download from the site already contain an ID in their filename that can be used to point directly to the info page.

Reading this Scrap (wiki) I find that without an NFO I have to create a search URL first, but I have no need to do that since I can point straight to the correct page without searching.

Is it possible for a scraper to not have to search, for it only to provide an address ready for use in DownloadDetailsPage?

I guess this would the same as using an NFO where the XML would just return a URL ready to go, but triggered from FILENAME instead of an NFO?

↧

Movie information scraper on all my Kodi

March 6, 2017, 1:17 pm

≫ Next: first_air_date regex bug in metadata.tvshows.themoviedb.org version 1.3.1?

≪ Previous: Skip CreateSearchUrl?

Hi,
New to this forum and would like some insight on how Movie database scraper works, right now moviedb and imdb movie information scrapes all my video library and very few addons movie list, but i would want the movie scraper to scrap all the names in my Kodi including all video addons.
Is there a way to see movie information on all video addons ?

Please help

↧

first_air_date regex bug in metadata.tvshows.themoviedb.org version 1.3.1?

March 28, 2017, 4:37 am

≫ Next: read or send movietitle when KODI start play movie

≪ Previous: Movie information scraper on all my Kodi

I think I might have stumbled upon an issue while debugging tv shows not appearing in the library. I'm using the TV show scraper from The Movie Database from http://mirrors.kodi.tv/addons/krypton/me...-1.3.1.zip

For the search function GetSearchResults it is doing a RegExp on the first_air_date of either ([0-9]+) or null. Looking at the JSON API response The Movie Database API is returning "" (blank) instead.

When searching for Lorraine returns 5 results. The first entry is "Lorraine" but the first_air_date is set to "". The kodi debug logs for GetSearchResults returned "Audience with Lorriane". Note this is the wrong show but matches first_air_date regex. My debug logs then shows kodi returns the GetVideoDetails for the id of the original "Lorraine" show. This tv show is in the kodi video library.

When searching for Unreported World returns 1 result. This result has the first_air_date is set to "". Since this is the only result, the search function GetSearchResults returns empty. The next line in the logs is the warning "No information found for item xxx, it won't be added to the library."

I've seen this behavior on multiple entries now. See FYI Daily and Location, Location, Location

Other programs where first_air_date is set are working fine. One example is The Big Bang Theory

How do I pass this information to the maintainer?

↧

read or send movietitle when KODI start play movie

March 28, 2017, 7:38 am

≫ Next: Scraper which parses another addon's JSONRPC response

≪ Previous: first_air_date regex bug in metadata.tvshows.themoviedb.org version 1.3.1?

Hey, Guys, i plan to build a sence via Domoticz when KODI start play movie, difference movie has difference sence, can somebody tell me how to read movietitle via Domoticz or KODI can send movietitle to Domoticz,thanks.

↧

Scraper which parses another addon's JSONRPC response

April 9, 2017, 3:28 pm

≫ Next: Subtitles scraper

≪ Previous: read or send movietitle when KODI start play movie

Hi,

I wrote a script type plugin, which builds some media informations, if I Call it with JSONRPC and pass a movie title. But I want to create a scraper which parses my script's JSONRPC response.

I think, I Can make the regex extract from the JSON, but how Can I make a request from the scraper to the script? Is it possible?

If not, is there any other way to do this?

Many thanks!

↧

Subtitles scraper

April 13, 2017, 2:56 pm

≫ Next: TheaudioDB

≪ Previous: Scraper which parses another addon's JSONRPC response

Hello everyone,

I am trying to make a subtitle scraper for TV media using the Kodi code base. Does Kodi fetch the subtitles from the video streaming or should the subtitles always be provided from extern sources like opensubtitles?

Kind regards

↧

TheaudioDB

September 1, 2016, 8:33 pm

≫ Next: Banned Scraper Question

≪ Previous: Subtitles scraper

Personal why not make a scraper that site down, but has much more information than many online.
The theaudidb is not nothing compared to them.

Website: https://www.vagalume.com.br/ ou http://www.midomi.com/

↧

Banned Scraper Question

April 24, 2017, 5:54 pm

≫ Next: Is THETVDB scraper using the new TVDB 2.0 api?

≪ Previous: TheaudioDB

I want to apologize to the admin on my question of a banned add-on (scraper) I mentioned. I am fairly new here...have asked some questions here and there but certainly not wanting to break the forums rule and there was no intent . Lesson learned.

↧

Is THETVDB scraper using the new TVDB 2.0 api?

May 26, 2017, 6:23 am

≫ Next: AMDB Arabic Movies Scraper

≪ Previous: Banned Scraper Question

Ronseal.

Is it? If not, will a new version be ready in time for the October 1st shut off?

Thanks!

↧

AMDB Arabic Movies Scraper

September 10, 2016, 9:15 am

≫ Next: Style for TV Show Episodes, Local Scraper.. Help

≪ Previous: Is THETVDB scraper using the new TVDB 2.0 api?

AMDB The Arabic Movie Database. is the first Arabic movies Scraper , work with Kodi (xbmc)

↧

Style for TV Show Episodes, Local Scraper.. Help

September 11, 2016, 3:20 pm

≫ Next: Scraper reading a xml file

≪ Previous: AMDB Arabic Movies Scraper

Hello,

Im using EMM to edit my KODI related Media-Files. I never scrape with Online-Scrapers included in KODI.

The Sort-Style how TV-Show-Episodes are shown in Kodi are different.

1x01
1x02
1x03
...

other

01.
02.
03.
...

Which command i have to use, to make them look equal? I really dont like 1x01 Style. I checked some .nfo Files where the style is correct and compared to the .nfo files, whre the style is incorrect i couldnt see the command for it.

Please help me.

Thank you.

↧

Scraper reading a xml file

September 22, 2016, 2:34 pm

≫ Next: using Kodi scrapers for personal project

≪ Previous: Style for TV Show Episodes, Local Scraper.. Help

Hello everyone, I would like to develop my first scraper. I would like to know if it is possible that the scraper read an xml file which you previously stored the details of the film.
I thank all those who help me.

↧

using Kodi scrapers for personal project

September 30, 2016, 12:59 pm

≫ Next: Scraping recorded TV-Shows - extend TVDB Scraper - get function calls right

≪ Previous: Scraper reading a xml file

Hi,

I would like to ask about the stance of the kodi project on a possible scraper reuse in another project (also GPL). And possibly any best practice you would recommend.
I was toying with the idea of writing my own scraper, but then saw the giant library of kodi scrapers, read how to write a scraper, and figured implementing your "engine" (which BTW I find very ingenious) would be a lot easier than writing/maintaining my own scrapers. I currently have a proof of concept working (themoviedb scraper loads and gets info for a movie Smile

) .

Of course I have no idea if it will go anywhere, but if it does, I'd like to be on good terms with you guys Wink

Thanks,

serafean.

↧

Scraping recorded TV-Shows - extend TVDB Scraper - get function calls right

October 6, 2016, 3:33 pm

≫ Next: No scraper or fanart for TV shows

≪ Previous: using Kodi scrapers for personal project

Already spent days to get this working: I got a local PVR to record Movies and TV shows. For recorded movie files I was already successful scraping them with the information my PVR gives me by calling its XML-API - with the TV shows I still fail and I now hope somebody can help me.

I thought the task would be simple: Just make a HTTP-Call to the PVR API, get the TVDB-ID there for the file to be scraped, and then go on with the regular TVDB-scraper using this ID. So mainly just modifying the functions "CreateSearchUrl" or maybe also "GetSearchResults".

Yet the main problem I have is calling a function in the right way to do this TVDB-ID lookup. I tried many ways - two of them are shown below.

What I was still not able to figure out is how to call a function in the right way. My main questions are:

When I use a function to get some XML-file from another site - how do I trigger to really GET the content from those sites? At what point of the execution of the scraper are URLs realy evaluated? My log files suggests that this is not triggered just calling a function - they mainly just extend the URLs and make them richer with code.
Do I need to enclose the results of a function with any XML-tags? Mostly all of the code sample put the results between <details> and </details> - but why? Why "details" and could I also use "url" for example. I don't understand how this is used. I just noticed that if I don't use any enclosing tags at all the function simple doesn't show up being executed in the log file.

So this is my code ...

First try - extend CreateSearchUrl to call the PVR API to get the TVDB-ID and then pass it on regularily to GetSerachResults:

Code:

<CreateSearchUrl dest="3">

    <RegExp input="$$1" output="<chain function="GetTVDBIdFromEpisode">\1</chain>" dest="5">

        <expression >(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>

    </RegExp>

    <RegExp input="$$5" output="<url>http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="3">

        <expression noclean="1" />

    </RegExp>

</CreateSearchUrl>

<GetTVDBIdFromEpisode dest="3">

    <RegExp input="$$4" output="<details>\1</details>" dest="3">

        <RegExp input="$$1" output="<url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="4">

            <expression />

        </RegExp>

        <expression noclean="1" />

    </RegExp>

</GetTVDBIdFromEpisode>

<ParseTVDBIdFromEpisode dest="5">

    <RegExp input="$$1" output="<details>\1</details>" dest="5">

        <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>

    </RegExp>

</ParseTVDBIdFromEpisode>

Not working - both the API call and the function ParseTVDBIdFromEpisode are working, but the result is not passed on to the top. So the value for the ID to be used in CreateSearchUrl is finally empty.

Code:

15:30 T:1826356112   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')

15:30 T:1826356112   DEBUG: scraper: CreateSearchUrl returned <url>http://thetvdb.com/api/GetEpisode.php?id=<chain function="GetTVDBIdFromEpisode">2026</chain>&language=de</url>

15:30 T:1826356112   DEBUG: scraper: GetTVDBIdFromEpisode returned <details><url function="ParseTVDBIdFromEpisode">http://10.0.0.1:8081/record.onexml?id=2026</url></details>

15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://10.0.0.1:8081/record.onexml?id=2026

15:30 T:1826356112    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1

15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"

15:30 T:1826356112   DEBUG: scraper: ParseTVDBIdFromEpisode returned <details>4818866</details>

15:30 T:1826356112   DEBUG: CurlFile::Open(0x6e1624b0) http://thetvdb.com/api/GetEpisode.php?id=

15:30 T:1826356112   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id="

15:30 T:1826356112   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><results></results>

Second try - extend the function GetSearchResults to make the API-call:

Code:

<CreateSearchUrl dest="3">

    <RegExp input="$$1" output="<url>http://10.0.0.1:8081/record.onexml?id=\1</url>" dest="3">

        <expression noclean="1">(?:%20| |_)([]0-9]+)(?:\.ts|$)</expression>

    </RegExp>

</CreateSearchUrl>

<GetSearchResults dest="8">

    <RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity>\1</entity></results>" dest="8">

        <RegExp input="$$1" output="<title>\1</title>" dest="5">

            <expression><t_caption>([^<]*)</t_caption></expression>

        </RegExp>

        <RegExp input="$$1" output="<url cache="tt\1.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=\1&amp;language=$INFO[language]</url>" dest="5+">            

            <expression><t_thetvdbid>([0-9]+)</t_thetvdbid></expression>

        </RegExp>

        <expression noclean="1" />

    </RegExp>

</GetSearchResults>

<GetTVDBIdFromEpisode dest="3">

    <RegExp input="$$1" output="<url cache="\1-$INFO[language].xml">http://thetvdb.com/api/1D62F2F90030C444/series/\1/all/$INFO[language].zip</url>" dest="3">

        <expression><seriesid>([0-9]+)</seriesid></expression>

    </RegExp>

</GetTVDBIdFromEpisode>

Still not working - the API function call is not done during execution of GetSearchResults. Instead the whole code for calling the function is passed on to the function "GetDetails" and there it is not working.

Code:

11:04 T:1825258144   DEBUG: std::vector<CScraperUrl> ADDON::CScraper::FindMovie(XFILE::CCurlFile&, const string&, bool): Searching for '20160830 rtl Bones - Die Knochenjaegerin 2026' using IPTV PVR TV Series Scraper scraper (path: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')

11:04 T:1825258144   DEBUG: scraper: CreateSearchUrl returned <url>http://10.0.0.1:8081/record.onexml?id=2026</url>

11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://10.0.0.1:8081/record.onexml?id=2026

11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://10.0.0.1

11:04 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://10.0.0.1:8081/record.onexml?id=2026"

11:04 T:1825258144   DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results><entity><title>Bones - Die Knochenj&#xE4;gerin</title><url cache="tt4818866.xml" function="GetTVDBIdFromEpisode">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></entity></results>

11:04 T:1825258144   DEBUG: bool ADDON::CScraper::GetVideoDetails(XFILE::CCurlFile&, const CScraperUrl&, bool, CVideoInfoTag&): Reading movie 'http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de' using IPTV PVR TV Series Scraper scraper (file: '/storage/emulated/0/Android/data/org.xbmc.kodi/files/.kodi/addons/metadata.iptvpvr.tvdb', content: 'tvshows', version: '1.0.0')

11:04 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de

11:04 T:1825258144    INFO: void XCURL::DllLibCurlGlobal::easy_aquire(const char*, const char*, XCURL::CURL_HANDLE**, XCURL::CURLM**) - Created session to http://thetvdb.com

11:05 T:1825258144   DEBUG: static bool CScraperUrl::Get(const CScraperUrl::SUrlEntry&, std::string&, XFILE::CCurlFile&, const string&): Using "UTF-8" charset for XML "http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de"

11:05 T:1825258144   DEBUG: scraper: GetDetails returned <?xml version="1.0" encoding="utf-8" standalone="yes"?><details><id></id><chain function="GetArt"></chain><episodeguide><url cache="-.xml">http://thetvdb.com/api/GetEpisode.php?id=4818866&language=de</url></episodeguide></details>

11:05 T:1825258144   DEBUG: scraper: GetArt returned <details><url function="ParseArt" cache="-de.xml">http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml</url></details>

11:05 T:1825258144   DEBUG: CurlFile::Open(0x71f0c220) http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml

11:05 T:1825258144   ERROR: CCurlFile::Open failed with code 404 for http://thetvdb.com/api/1D62F2F90030C444/series//banners.xml

Thank you for very much your help or your ideas - highly appreciated!
Gerald

↧

No scraper or fanart for TV shows

October 17, 2016, 6:27 pm

≫ Next: TMDB TV Series

≪ Previous: Scraping recorded TV-Shows - extend TVDB Scraper - get function calls right

I'm using libreelec on raspberry PI 3, and TVHeadend for the backend and of course Kodi for the frontend. I'm using tvheadend pvr client. Also, I have two HDHomeruns HDHR and HDHR3. I'm trying to get my TV Show library to show up. I've chosen the folder and set it up for TVDB it scans and finds the content of the folder, but I don't get any information or fanart to show up under TV Show, not if I click on Title, recently added espisodes, etc there is nothing there. I have to go into the folder under files to see them. If I highlight the show I can see a picture associated with the show. How do you get the information to show up?

↧

TMDB TV Series

October 26, 2016, 4:28 pm

≫ Next: Merging

≪ Previous: No scraper or fanart for TV shows

How can you do a job in half, set the TMDB to take the language "PT-BR", but they forgot the tv series.

↧

Merging

November 3, 2016, 2:38 pm

≫ Next: Scraper not Scraping

≪ Previous: TMDB TV Series

Many videos I have don't scrape because they aren't on any database. I'd like Kodi to add them regardless, using their filename as title. So there's a scraper that does that, and this is the code:

Code:

<CreateSearchUrl dest="3" clearbuffers="no">

        <RegExp dest="3" output="&lt;url&gt;http://search.yahoo.com/search?p=$$7&lt;/url&gt;" input="$$7">

            <RegExp dest="7" output="\1" input="$$1">

                <expression noclean="1" trim="1">(.+)</expression>

            </RegExp>

            <expression noclean="1"></expression>

        </RegExp>

    </CreateSearchUrl>

    <GetSearchResults dest="8" clearbuffers="no">

        <RegExp dest="8" output="&lt;results sorted=&quot;yes&quot;&gt;\1&lt;/results&gt;" input="$$5">

            <RegExp dest="5" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;url&gt;http://search.yahoo.com/search?p=$$7&lt;/url&gt;&lt;/entity&gt;" input="$$1">

                <expression noclean="1" trim="1">&lt;title&gt;([^"]*)\-\sYahoo Search</expression>

            </RegExp>

            <expression noclean="1"></expression>

        </RegExp>

    </GetSearchResults>

    <GetDetails dest="3" clearbuffers="no">

        <RegExp dest="3" output="&lt;details&gt;\1&lt;/details&gt;" input="$$5">

            <RegExp dest="5+" output="&lt;title&gt;\1&lt;/title&gt;" input="$$1">

                <expression noclean="1" trim="1">&lt;title&gt;(.+)\s\-\sYahoo Search Results&lt;/title&gt;</expression>

            </RegExp>

            <expression noclean="1" trim="1"></expression>

        </RegExp>

    </GetDetails>

I'd like to merge this into the code of either tmdb.xml or universal.xml, so that I have one scraper that gets most movies with, and then uses the one above for when TMDB can't match the filename with anything. I don't know where to put it, though. If I do it at the top, it takes priority over TMDB to the point were TMDB is not used at all, and if I put it at the very end it gets ignored.

↧