Парсинг телеграмм канала python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

mmat16/telegram_channel_parser

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Читайте также:  Знак новой строки java

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Для того чтобы пользоваться данным парсером вам необходимо установить Python3 и несколько сторонних библиотек с помощью данной команды:

На Windows: pip install -r requirements.txt

На Mac OS и Linux: pip3 install -r requirements.txt

Так же вам понадобится зарегистрировать собственное приложение Telegram. Для этого надо зайти на сайт https://my.telegram.org/apps, зайти в свою учётную запись Telegram и создать приложение (Create new application). Следует указать:

  • App title — название приложения (неважно какое)
  • Short name — сокращённое название (только буквы и цифры, 5-32 символа)
  • Platform — указать Other

Остальные поля можно оставить пустыми. Нажать кнопку Create application. В этот момент зачастую Telegram не пускает вас дальше по непонятным причинам, но главное не сдаваться. Иногда помогает прокликивание без изменения данных, иногда надо поменять App title или Short name. После того как ваше приложение будет зарегистрировано откроется следующая страница на которой будут указаны регистрационные данные вашего приложения. Стоит сохранить все данные в надёжном месте, но для работы парсера вам понадобятся графы App api_id и App api_hash. Их надо вставить в одноимённые переменные в файле config.py.

После установки библиотек и регистрации приложения, парсером можно пользоваться. Для этого:

  • зайдите в директорию с исходным кодом и вызовите парсер командой «python3 parser.py»
  • при первом запуске будет необходимо подтвердить вход через Telegram (двухфакторную аутентификацию лучше отключить на это время):
    • в консоли появится сообщение, после которого надо ввести номер телефона, привязанный к Telegram
    • после следующего сообщения ввести код подтверждения Telegram

    После получения ссылки сразу же начнётся сбор сообщений.
    В директории со скриптом появится папка с айди канала и журнал с расширением .log куда будут заносится отметки о работе скрипта. Внутри папки канала начнут появляться папки с названиями, соответствующими айди сообщения, а в них будет находится текстовый файл с текстом сообщения и зашитыми в него гиперссылками, а так же текстовый файл с «метаданными» — ссылкой на сообщение и датой и временем его отправки. Так же если к сообщению были приложены какие-либо медиа — они будут загружены в ту же папку.

    По умолчанию (при первом запуске) скрипт будет собирать сообщения за последние три месяца. Если же при повторном запуске в директории скрипта будет находится папка с ранне собранными сообщениями канала, то собраны будут только новые сообщения.

    Источник

    Saved searches

    Use saved searches to filter your results more quickly

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

    Parse Telegram channels and users

    License

    alevikpes/telegram-parser

    This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

    Name already in use

    A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

    Sign In Required

    Please sign in to use Codespaces.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching Xcode

    If nothing happens, download Xcode and try again.

    Launching Visual Studio Code

    Your codespace will open once ready.

    There was a problem preparing your codespace, please try again.

    Latest commit

    Git stats

    Files

    Failed to load latest commit information.

    README.md

    This script logs in as a user and can perform actions on behalf of the currently logged in user.

    At the moment it can get all the groups, where the user is subscribed and get a list of users of some of those groups. This script cannot get the users of any group due to some kind of restrictions, which the group admins place for in the groups.

    In order to start using this bot, it is necessary to obtain API ID and API HASH for your user account.

    It can be done via https://my.telegram.org/. Enter your phone number, verify with the sent code and go to the API Development Tools page. There create an app and copy API ID and API HASH.

    If there is a plan to use a bot, then the bot must be created via the BotFather in any Telegram application. See Telegram instructions. Such bot will have a name and a token, which also must be stored.

    NOTE Bots cannot perform all the actions, so the creation of the user application, as desribed above, may be necessary for certain tasks.

    WARNING Never give anyone the credentials of neither your user application nor your bot. Also add to the ignore list of your VCS, the files, which have the credentials stored.

    Create a file .env and save there the API ID, the API HASH, the bot name, the bot token, a database name (can be any), a session name (can be any) and, possibly, other required data in the following format:

    APP_API_HASH= APP_API_ID= DB_NAME= SESSION_NAME= TG_BOT_NAME= TG_BOT_TOKEN= . 

    NOTE Do not use spaces or other special characters in your custom names.

    WARNING Never give anyone the credentials of neither your user application nor your bot. Also add to the ignore list of your VCS, the files, which have the credentials stored.

    Create a python virtual environment (search online about how to do it for your OS). On Linux Debian distros it can be done with:

    sudo apt install python3-venv -y python3 -m venv /path/to/virtual-environment

    Start your virtual environment:

    source /path/to/virtual-environment/bin/activate
    pip3 install -r requirements.txt

    This will create an sqlite database file with two tables group and user . The name of the file will be read from the .env file. See init.py for more details.

    After the initialisation it is all ready for parsing channels and users.

    NOTE Always start your virtual environment before executing the sripts:

    source /path/to/virtual-environment/bin/activate

    Run main.py to start the parsing:

    The script will parse the channels and save their info and the data of the participants of those channels into the database.

    In order to parse only one channel, use -g optional argument with the cahnnel username (the name which starts with @ symbol and can be found in the channel info page, specifying @ is not necessary):

    python3 main.py -g channel username>

    Источник

    Saved searches

    Use saved searches to filter your results more quickly

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

    Python program designed to scrape posts from Telegram channels using HTTP requests and HTML parsing, rather than Telegrams API. This is useful, as selfbots are against Telegram’s ToS.

    License

    Steelio/Telegram-Post-Scraper

    This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

    Name already in use

    A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

    Sign In Required

    Please sign in to use Codespaces.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching Xcode

    If nothing happens, download Xcode and try again.

    Launching Visual Studio Code

    Your codespace will open once ready.

    There was a problem preparing your codespace, please try again.

    Latest commit

    Git stats

    Files

    Failed to load latest commit information.

    README.md

    Telegram Post Scraper via Python

    Telegram-Post-Scraper is a Python program designed to scrape posts from Telegram channels using HTTP requests and HTML parsing, rather than Telegram’s API. This program is useful when creating bots or using Telegram’s API is not feasible or against Telegram’s terms of service. TG-Post-Scraper also has the capabilities to download multimedia, videos and images from a Telegram post. Atop of this, it offers the ability to save posts and the bulk data to text files for ease of access.

    • Version 2.0 released. Code was rewritten. Should be more efficient. • Added support for multiple links. Separate with a comma. Link Ex: (t.me/somegroup/540,t.me/someothergroup/250) • Added video downloading support. •• If you find any bugs please submit an issue ticket. Whipped this up semi-buzzed. So I may have overlooked something. Much love, enjoy y'all ♥ 
    • Scrapes posts from Telegram channels using HTTP requests and HTML parsing.
    • Can copy the content of the posts, and download media such as images and videos.
    • Supports scraping multiple links in one session. Seperate links at the beginning of the program with commas. (t.me/groupID/333,t.me/someotherID/444,t.me/anotherOne/555)
    • Does not require a bot or an API key.
    • Useful for situations where using Telegram’s API or creating a bot is not feasible or against Telegram’s terms of service.

    To use Telegram-Post-Scraper, you need to have Python 3 installed on your system, as well as the following Python packages: This program was built on Python 3.10.10 64bit

    You can install these packages using pip by running the following command:

    pip install -r requirements.txt 

    To use Telegram-Post-Scraper, you just provide it with a URL of a Telegram post.

    1. Open Command Prompt, Powershell, or Terminal. 2. Run "py(thon3) main.py" 3. Enter your Telegram post URL. (Format: https://t.me/SOMEGROUP/NUMERICID) 3a. You can find the link of a Telegram post by right clicking it and pressing "Copy Link". 4. Follow through the prompts in the console window. 

    If you find any bugs or have suggestions for improvements, feel free to create an issue or submit a pull request.

    Was this program useful to you? If you want to donate ♥ :

    BTC: bc1qvrm0tepx6jdxcsr99z5xqmswcl9ad333nenkeg LTC: LSuSA99uMbC1BtQ4eJxpczAsv3W7KbtahF

    About

    Python program designed to scrape posts from Telegram channels using HTTP requests and HTML parsing, rather than Telegrams API. This is useful, as selfbots are against Telegram’s ToS.

    Источник

Оцените статью