Leetchi visualisation challenge

charles perez
24 mars 2017
5 min de lecture

Description of the challenge

?xml version="1.0" encoding="UTF-8"?

Crowdfunding platforms have known an increasing success over the last few years. With more than $43 Billion euros raised in 2016, this alternative to traditional finance processes has received a important interest in the research community. While crowdfunding by counterparts has been extensively studied, until now only a very few works have focused on donation-based crowdfunding platforms. In this challenge, we propose to analyze the Leetchi crowdfunding platform and the impact of 70 variables on the success of Leetchi money-pot.

The variables are extracted from Twitter and Leetchi website and can be categorized into the four following groups: Leetchi project page, Twitter community, tweets content and the tweets diffusion process.

The dataset is composed of 1,415 Leetchi projects along with 50119 project-related tweets.

The goal of the challenge is to provide a set of recommendation for decision support regarding the possible strategy to apply and key factors to take into account while trying to get funds from a donation perspective.

Software requirements

Download the latest version of Tableau

Click on the link above and select Get Started. On the form, enter your university email address for “Business email”; and under "Organization", please input the name of your school.

A quick start guide for Tableau software is available.

A Youtube Tableau Channel

Dataset

The Leetchi dataset is available for download on Harvard dataverse.

Deliverable

List of all variables included in the dataset

Target: reflects rather the project is dedicated to a single person, a group of persons or an organization
#Facebook: the number of times the “Facebook” word and Facebook URLs are detected in the description. The previous works found that the presence of Facebook links is correlated with the success of all-or-nothing project
#Twitter: the number of times the “Twitter” word and Twitter URLs are detected in the description
#Media: the number of videos and images that are used in the project page
Length: the number of characters used to write a description
SMOG: the estimation of the years of education required to understand the description. The formula remains on the number of sentences and the number the polysyllables observed in these sentences. The textstat.py python library was used to computed the SMOG index such as the other metrics of text complexity.
FleschKincaidGrade: the grade score using the Flesch-Kincaid Grade Formula. This formula is computed using the total of words divided by the total sentences such as the total syllables by the total of words.
ColemanLiauIndex: the grade score using the Coleman-Liau Formula. This formula remains on the average number of letters per 100 words and the average number of sentences per 100 words.
AutomatedReadabilityIndex: the grade score computed using the Automated Readability Index which outputs a number that approximates the grade level needed to comprehend the text. It relies on the total of characters by total of words such as the total words by the total of sentences.
LinsearWriteFormula: the grade score using the Linsear Write Formula. This formula is based on the number of easy words and hard words from a sample of the text.
GunningFog: FOG index of the given text - weighted average of the number of words per sentence, and the number of long words per word.
DifficultWords: the number of difficult words by using as reference the Dale-Chall Word List of familiar words. This list contains three thousand familiar words that are known in reading by at least 80% of the children in grade 5.
DaleChallReadabilityScore: the grade level using the New Dale-Chall Formula. This metric is computed using the ratio of difficult words by words and the ratio of words by sentences.
#Click: the number of times the word “click” were used (e.g. “Click on donate to contribute to the project”)
#Secure_payment: the number of times an expression “secure payment” were used
#Currency: the number of times currency symbols were used
#Numbers: the number of times numbers were used in the project description
#Link: the number of links that are used in the description
#Email: the number of e-mails that are used in the description
Language: the language used in the description (French, English, German, Spanish, other). The language were detected using the ISO 3166-1 alpha-2 country code available in the project URL. When unavailable, the langid.py python library was used to identify the language.
#Promoters: the number of unique profiles that tweeted/retweeted about the project
#Tweets: the number of tweets mentioning the project
#Replies: the number of replies to tweets mentioning the project
#Retweets: the number of retweets mentioning the project
#Mentions: number of mentions in project-related tweets
#Duration: the number of days between the first project-related tweet till the last project-related tweets observed in the dataset (duration of the campaign)
#ActiveDays: the number of days where at least one tweet was recorded during the campaign
#InactiveDays: the number of days of the campaign with no observed activity. This feature is computed as #Duration - #ActiveDays
AVG_favorites: the average number of times the Twitter promoters likes tweets before joining the community
MAX_favorites: the highest number of likes of profiles belonging to the community
AVG_statuses: the average number of tweets published by the promoters before joining the community
MAX_statuses: the number of tweets published by the most active promoter before joining the community
AVG_friends: the average number of friends of unique promoters of the community
MAX_friends: the maximum number of friends that promoters have
#Influencers500: the number of promoters in the community having more than 500 followers on Twitter
#Influencers1k: the number of promoters in the community having more than 1,000 followers on Twitter
#Influencers10k: the number of promoters in the community having more than 10,000 followers on Twitter
#Influencers50k: the number of promoters in the community having more than 50,000 followers on Twitter
#Influencers100k: the number of promoters in the community having more than 100,000 followers on Twitter
MAX_followers: the number of followers of the most influential profile in the community
AVG_followers: the average number of followers of unique promoters of the community
Leetchi: indicates if the project were promoted by the official Leetchi Twitter account
#Help: the number of times the word “Help” were mentioned in project-related tweets
#HashtagRt: the number of times the hashtag #Rt were used in tweets
#Retweet: the number of times the word “Retweet” were used in tweets
#RT: the number of times the word “RT” were used in tweets
#Mobilise: the number of times the word “mobilise” were used in tweets
#Solidarity: the number of times the words including “solidarit” were used in tweets
#Important: the number of times the word “Important” and urge were used in tweets
#Thank: the number of times acknowledgment related words were used in tweets
#Hashtags: the number of hashtags used during the project-related Twitter campaign
AVG_hashtags: the average number of hashtags by tweet
AVG_mentions: the average number of mentions by tweet
Tweetsentiment: the general sentiment of tweets (computed using the AFINN dictionary of words tagged with sentiment scores [IMM2011-06010])
MAX_Sentiment: the most positive tweet score
MIN_Sentiment: the most negative tweet score
AVG_Tweetsentiment: the average sentiment of tweets

Charles Perez, PhD

Leetchi visualisation challenge

Commentaires