4.1 Applied Python (Advanced Course) https://brilliant.org/courses/python-text-analysis/dictionaries-and-counters/?from_llp=computer-science

 4.1 Applied Python  (Advnaced Course)

https://brilliant.org/courses/python-text-analysis/dictionaries-and-counters/?from_llp=computer-science

 

 Lesson 3 Counting Unique Words

Instead of dictionary use Counter() function it is more handly

No need to initialized any key before adding any addition:
















Python
from collections import Counter
#make a counter object
counter = Counter()
#make a string
word = 'horror'
#loop over the letters in the string and add to the counter
for letter in word:
counter[letter] += 1

#print the value of key "r"
print(counter['r'])



from collections import Counter
counter = Counter("horror")
key = 's'
print(key, counter[key])

Python
from collections import Counter
counter = Counter('horror')
key = 's'
print("Before:",key, counter[key])
counter[key] += 1
print("After :",key, counter[key])
Output
Before: s 0
After : s 1





























Modify this program to use a Counter rather than a dictionary, and print the 3 most common words in the text. You should be able to get rid of the if statement

I have change to Counter() in place of using dictionary.

Python
Selection deleted
from collections import Counter
reader = open('data/jekyll.txt')

word_counts=Counter()
for line in reader:
for word in line.split():
clean_word = word.strip('.;,-“’”:?—‘!()_').lower()
word_counts[clean_word]+=1
print(word_counts.most_common(5))
Output
[('the', 1608), ('and', 972), ('of', 937), ('to', 640), ('i', 640)]

Lesson 4/7 Bigrams and Mutability



Python
my_dict = {"Title": "Dr", "Surname": "Jekyll"}
my_dict.pop("Surname")
print(my_dict)

Output
{'Title': 'Dr'}

What kind of data type is a dictionary?

Dictionaries and counters are mutable (chnagable), but their keys can't be.


Lesson 5/7 Bigram Frequency and Tuples

Which operation works with a list, but not with a tuple?

We can iterate through a tuple, or retrieve an element, but methods like append and pop that change lists don't exist for tuples, because tuples are immutable.

Bigram Frequency

This program applies the same logic to the whole text. We'll use it to find out more about the frequency of different bigrams.

Python
reader = open('data/jekyll.txt')
from collections import Counter
bigram_counter = Counter()
window = []

for line in reader:
for word in line.split():
clean_word = word.strip('.;,-“’”:?—‘!()_').lower()
window.append(clean_word)
if len(window) >= 2:
window_tuple = tuple(window)
bigram_counter[window_tuple] +=1
window.pop(0)
print(bigram_counter.most_common(10))
Output
[(('of', 'the'), 175), (('in', 'the'), 139), (('it', 'was'), 94), (('and', 'the'), 80), (('to', 'the'), 73), (('of', 'a'), 72), (('mr', 'utterson'), 71), (('the', 'lawyer'), 65), (('of', 'my'), 61), (('on', 'the'), 60)]

We can check how often each character appears in the text by looking up their names as bigrams.

Note: must lower case and without any puctuation like .

Python
# ... program continues ...
# Our counter object is called bigram_counter
# Add your code here.
print(bigram_counter[("mr" , "hyde")])
print(bigram_counter[("dr", "jekyll")])
print(bigram_counter[("mr", "utterson")])

  • To use the bigram as a key, we converted it to a tuple.

  • To count the number of times each bigram appears, we used a Counter object.


Lesson 6/7 Bigram Analysis

N-gram Analysis

The three bigrams are ("the", "honour")("honour", "the") and ("the", "sanity").

What should a dictionary built from just these three bigrams look like?

Python
successor_map = {}

bigram1 = ('the', 'honour')
bigram2 = ('honour', 'the')
bigram3 = ('the', 'sanity')
bigrams = (bigram1, bigram2, bigram3)

for bigram in bigrams:
if bigram[0] not in successor_map:
successor_map[bigram[0]] = [bigram[1]]
else:
successor_map[bigram[0]].append(bigram[1])

print(successor_map)

Output
{'the': ['honour', 'sanity'], 'honour': ['the']}











































































Comments

Popular posts from this blog

PANDAS micro course by www.Kaggle.com https://www.kaggle.com/learn/pandas

Course No 2 Using Python to Interact with the Operating System Rough Notes

Introduction to Git and GitHub https://www.coursera.org/learn/introduction-git-github/