extract only the description Russian (Regex issue)

English support forum

Moderators: Hacker, petermad, Stefan2, white

Post Reply
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

extract only the description Russian (Regex issue)

Post by *makinero »

I have a html file and EmEditor 64-bit (The only tool to open large text files really big)
I want the code to extract only the text contained between
<br> text </br>
or
<br> text <br>

My regex invalid/very bad

Code: Select all

(<br>(.*)</br>)
ERROR MESSAGE
http://www.imagebam.com/image/6c4a10530595417

What regex to export only the code and text?
<br> text </br>

example:

HTML
<br>Анкеты предлагаются в раздел \"Предложить новость\", к<br>
or
<br>Анкеты предлагаются в раздел \"Предложить новость\", к</br>
User avatar
Ovg
Power Member
Power Member
Posts: 755
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

Code: Select all

(?<=<br>)[^\\]*?(?=(</br>|<br>))
Last edited by Ovg on 2017-02-04, 13:12 UTC, edited 3 times in total.
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

It works, but I want to delete the HTML code (optional)

<br>CODE or TEXT</br>

Did not include anything between <br></br> if it contains even one character

Code: Select all

\

Ignore / Exclude

Code: Select all

<br>Анкеты предлагаются в раздел \"Предложить новость\", к</br>

Code: Select all

duration":290,"url":"http:\/\/p17\/ea754b44ba4fd2=hNV-kW6KieVSbTVAO5s7WNVMdoFW5_ryU_wPKJY4O0pMT-6A_LcRLQHcTl9SQzd8Zry55GksCLSsSS7IMGpbqLTgW9IfH0D_yxL2qTJ2f68H3Uwu4","performer":"Animal 
Анкеты предлагаются в раздел \"Предложить новость
Include

Code: Select all

<br>Анкеты. предлагаются - () в раздел,  1. 12.345 klj9к</br>
User avatar
Ovg
Power Member
Power Member
Posts: 755
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

Very nice.
Exclude

Code: Select all

\
{
}
[
]

Code: Select all

(?<=<br>)[^\\^[^{]*?(?=(</br>|<br>))
User avatar
Ovg
Power Member
Power Member
Posts: 755
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

2makinero

Code: Select all

(?<=<br>)[^\\\[\]\{\}]*?(?=(</br>|<br>))
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

everything fine thanks! You are great.
User avatar
nsp
Power Member
Power Member
Posts: 1951
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Post by *nsp »

makinero wrote:everything fine thanks! You are great.
Keep in mind that between your br tag, you do not have plain text but "encoded" code. Once extracted, you will need an extra decoding step if it contain "&" char !
for minimal decoding :
& => &
< => <
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

nsp wrote:
makinero wrote:everything fine thanks! You are great.
Keep in mind that between your br tag, you do not have plain text but "encoded" code. Once extracted, you will need an extra decoding step if it contain "&" char !
for minimal decoding :
& => &
< => <
Moreover, I have found such a character:

Code: Select all

> >
онажей, или написать вместе с вами сказку!!!<br><br>>обожаешь магию~<br>>играешь в warcraft<br>>интересуешься КореейЯпонией<br>>мечтаешь путешествовать<br>
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

Ovg wrote:

Code: Select all

(?<=<br>)[^\\\[\]\{\}]*?(?=(</br>|<br>))
Does not work for this text between tags.

Code: Select all

<br>Обожаю животных и растения.  Увлекаюсь аниме  , сериалами  и чтением . Последнее наверное самое любимое.   Я восхищаюсь  книгами "Дом , в котором. .. " , " Дом странных детей " , " Скотный  двор " , " Рыцарь на золотом коне " . На самом деле могу продолжать этот список целую вечность.  Книги для меня - способ уйти от проблем и погрузиться в удивительный,  сказочный мир.<br>
User avatar
Ovg
Power Member
Power Member
Posts: 755
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

makinero wrote: Ignore / Exclude
Code:
<br>Анкеты предлагаются в раздел "Предложить новость", к</br>

Exlude

Code: Select all

\
{
}
[
]
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Post by *makinero »

All the time there is some problem with regex, because the text contains different text :( :( :(

WRONG REGEX

Code: Select all

<br>[^\:]+<br>
Text 1:
Need Correct Regex (entire text detect)
<br>TEXT<br> or </br>

Code: Select all

<br><br>Ищу человека,  ищу!<br><br>Рассной особе. Начнем-с?<br><br>Раз. Я дико люи \/однаю.<br><br>Два. Музы вмиг.<br><br>Три. Рисую, а если точнее, срисовываю. Плохо, правда, срисовываю, но некоторым нравится. Это что-то типа через раз получается. И обязательно с чашкой чая \\0\/<br><br>Четыре. Пишу фанфики и просто истории. Кроме того, я и ролевик тоже.<br><br>Пять. Запоями читаю книги. Или не читаю неделями, вместочто вспомнить сейчас почти что нереально. Как и сериалы. Но сейчас я зависла на сериале \"Сотня\". Иногда удивляюсь, почему у него такой низкий рейтинг?..<br><br>Шестакое счастье с:<br><br>Забыла сказать садевчонки\/мальчишки могут дать нам интересную тему для рассуждений. Так что, удачи Вам, если Вы захотели написать мне письмо.<br><br>P.S. Если у Вас есть какая-то проблема, Вы так же можете мне написать. Ведь лучше, когда выговоришься, не правда ли?<br>

Text2:

"post_type":"post","text":"Здравствуйте)<br>И, навтникам этой чудесной группы \/особенно к жителям Беларуси\/. Я поного внимания., как мне кажется.<br>Это моя подруга. 27 фе.<br>Мне<br>За адресом - в лс)<br><br>P.S На фото её рисунок:3"

"post_type":"post","text":T E X T"
TAG TEXT TAG




<br>any text with any character<br>
Any text with any character</br>

EXAMPLE ANY CHARS ][:\/()-?_+|" etc.

Detect tags and text between tags, what correct regex?

regular expression syntax is based on Perl regular expression syntax.

<br>Here can include all letters, numbers, all characters, unicode, everything<b><b>Here can include all letters, numbers, all characters, unicode, everything<br></br><br>Here can include all letters, numbers, all characters, unicode, everything<b><b>Here can include all letters, numbers, all characters, unicode, everything</br>

So I want to detect everything between these tags and only the tags and <b><b> or <b></br>
Understand?
Post Reply