About the author:
Oleksandr Oliinyk is a Ruby / NodeJS Developer at Syndicode and a Team Lead of a project that has microservice architecture. The application contains different services written on Ruby and NodeJS, several PostgreSQL databases, ElasticSearch, Amazon S3, Stripe, and others.
What is Instagram and what do we need from it?
Instagram is a public resource where there are millions of popular bloggers, models, stars, couches, and other influencers. But as we know, the service doesn’t provide an extensive filter tool.
Instagram allows some “TOP” search where it shows the most relevant hashtags, accounts, and places. Also, there are tabs for each of those items.
Let’s imagine that you need to find bloggers who have content about food and more than 1k followers. Doing it manually is not effective, cause you would need to go to #food hashtag page (screenshot 1 below) and check all posts owners. The goal is to find a profile where almost all content is about food and cooking (screenshot 2 below). And not where an owner put #food once (screenshot 3 below).
Here https://github.com/dilame/instagram-private-api comes to the rescue.
Automate it!
1. Connect the lib, generate a device, and login into the service.
require('dotenv/config')
const { IgApiClient } = require('instagram-private-api')
const { USERNAME, PASSWORD } = process.env
(async () => {
ig.state.generateDevice(USERNAME);
await ig.simulate.preLoginFlow()
await ig.account.login(USERNAME, PASSWORD)
await ig.simulate.postLoginFlow()
})()
2. Make some delay between requests and simulate user kind behavior.
const bluebird = require('bluebird')
(async () => {
// login part
await bluebird.delay(2000)
await ig.feed.news().items()
await bluebird.delay(2000)
await ig.feed.discover()
})()
3. Create an object that gets posts from “TOP” tab of the hashtag page.
(async () => {
// prev part
let hashtagFeed = ig.feed.tags('food', 'top')
let posts = await hashtagFeed.items()
})()
Post looks like:
{
taken_at: 1586967673,
pk: '2287973129702462351',
id: '2287973129702462351_4513550583',
device_timestamp: 158696758248788,
code: 'B_Ag3BzjdOP',
image_versions2: {},
original_width: 1349,
original_height: 1687,
user: {
pk: 4513550583,
username: 'alex__oliinyk',
full_name: 'Alex Oiinyk',
is_private: false,
},
comment_count: 29,
like_count: 2097,
caption: {
text: 'Grilled Octopus ? via @alex__oliinyk\n' +
'#dinner #food #yummy #complex #foodpics #foodphotography #hungry #lovefood',
},
}
4. Now we need to understand whether an account is a food blogger or no. My simple solution is matching a predefined list of hashtags with the user’s posts description. If at least half posts contain some hashtags, then it’s probably a relevant profile. Add function that gets full info about the post’s owner, then fetches the last user’s posts and check relevance. If this is it – save the user.
const hashtags = [
'food',
'instafood',
'cooking',
'muffin',
'apple',
// and so on
]
function isContainHashtags(post) {
const { caption: { text } } = post
let result = false
hashtags.forEach((hashtag) => {
if (text.includes(`#${hashtag}`)) result = true
})
return result
}
function saveUser(user) {
// save user to db or another place
}
async function postHandler(post) {
const fullUser = ig.user.info(post.user.pk)
const userFeed = ig.feed.user(post.user.pk)
await bluebird.delay(2000)
const userPosts = await userFeed.items()
let relevantPostsCount = 0
userPosts.forEach((userPost) => {
if (isContainHashtags(userPost)) relevantPostsCount += 1
})
if (fullUser.follower_count >= 1000 && relevantPostsCount > 6) {
saveUser(fullUser)
}
}
5. Now let’s finish our script. Pay attention to the regular loop “for”. We don’t need any parallelism here. The posts should be checked one by one. Imagine that you are a user of Instagram Android application.
require('dotenv/config')
const { IgApiClient } = require('instagram-private-api')
const bluebird = require('bluebird')
const { USERNAME, PASSWORD } = process.env
const hashtags = [
'food',
'instafood',
'cooking',
'muffin',
'apple',
// and so on
]
const ig = new IgApiClient()
function isContainHashtags(post) {
const { caption: { text } } = post
let result = false
hashtags.forEach((hashtag) => {
if (text.includes(`#${hashtag}`)) result = true
})
return result
}
function saveUser(user) {
// save user to db or another place
}
async function postHandler(post) {
const fullUser = ig.user.info(post.user.pk)
const userFeed = ig.feed.user(post.user.pk)
await bluebird.delay(2000)
const userPosts = await userFeed.items()
let relevantPostsCount = 0
userPosts.forEach((userPost) => {
if (isContainHashtags(userPost)) relevantPostsCount += 1
})
if (fullUser.follower_count >= 1000 && relevantPostsCount > 6) {
saveUser(fullUser)
}
};
(async () => {
ig.state.generateDevice(USERNAME);
await ig.simulate.preLoginFlow()
await ig.account.login(USERNAME, PASSWORD)
await ig.simulate.postLoginFlow()
await bluebird.delay(2000)
await ig.feed.news().items()
await bluebird.delay(2000)
await ig.feed.discover()
let hashtagFeed = ig.feed.tags('food', 'top')
let posts = await hashtagFeed.items()
for (let i = 0; i < posts.length; i += 1) {
if (!posts[i]) continue
// eslint-disable-next-line no-await-in-loop
await postHandler(posts[i])
}
await bluebird.delay(2000)
hashtagFeed = this.ig.feed.tags('food', 'recent')
posts = await hashtagFeed.items()
for (let i = 0; i < posts.length; i += 1) {
// eslint-disable-next-line no-await-in-loop
await postHandler(posts[i])
}
})()
This is only a simple example of what you can with this library. It provides almost all functionality that the Android app has. Other interesting algorithms could be: searching in exact locations, getting user’s followers, and so on.
Also, check my previous article about Developing Scalable NodeJS Web Scraper. Combination of these logics can make magic 😉