英语作文单词统计

1. 实战目标

“世界是你们的,也是我们的,但是归根结底是你们的”, 这是伟大领袖毛主席的经典语录,world.txt是以这句话为主题用AI工具生成的一篇英语作文,请编写程序解析这篇作文,输出所使用的单词的数量(去重后), world.txt内容如下:


The phrase "The world is yours and yet, it is primarily yours" resonates with a profound truth that transcends generations. It speaks to the inherent promise of tomorrow being shaped by the youth, while also acknowledging the shared responsibility we have for the present and future state of our planet. This essay delves into the implications of this statement, reflecting on the role of young people in driving progress and change, as well as the collective duty we all bear in fostering a sustainable and equitable world.

At its core, this saying underscores the potential of the youth to innovate, inspire, and implement transformative changes. History is replete with examples where young visionaries—be it scientists, activists, artists, or leaders—have propelled society forward. Figures like Marie Curie, whose groundbreaking work in radioactivity paved the way for countless advancements in science; Malala Yousafzai, who became the youngest Nobel laureate for her advocacy for girls' education; and Greta Thunberg, whose unwavering stance on climate action has galvanized a global movement, exemplify how the energy, creativity, and idealism of the young can lead to significant societal shifts.

However, the statement also serves as a reminder that while the torch of progress is passed to the younger generation, it is not without the accumulated wisdom, experience, and infrastructure built by those who came before. The 'world is ours' part acknowledges the contributions of past and present generations in creating the platforms upon which the youth can stand tall. It emphasizes that progress is rarely linear or isolated; rather, it is a continuum where each generation builds upon the achievements and lessons of the last.

Moreover, the phrase highlights the urgency and responsibility placed upon the shoulders of the young. In an era marked by unprecedented challenges such as climate change, social inequality, and technological disruption, the world needs fresh perspectives and bold actions more than ever. The youth, equipped with digital literacy and a global outlook fostered by interconnectedness, are uniquely positioned to address these complex issues creatively and collaboratively. Their voices, when amplified, can challenge established norms, drive policy changes, and inspire collective action towards a more sustainable and just world.

Yet, this responsibility is not solely the domain of the young. The phrase 'but ultimately it is primarily yours' implies a call to action for all, encouraging mentorship, support, and collaboration across generations. It underscores the importance of creating environments where young minds can thrive, access quality education, and have the opportunities to contribute meaningfully. It also calls for older generations to listen, learn from the youth, and facilitate their growth, recognizing that the solutions to many of today's problems lie in embracing diversity of thought and experience.

In conclusion, "The world is yours and yet, it is primarily yours" encapsulates both the immense potential and the profound responsibility bestowed upon the youth. It serves as a rallying cry for young people to embrace their agency in shaping the future while acknowledging the interconnectedness of progress across generations. It reminds us all that the task of nurturing this potential and ensuring a smooth transition of knowledge, values, and responsibilities is paramount. Only through such collaborative efforts can we truly create a world that benefits all, now and for generations to come.

2. 考察知识点

  1. 集合的使用
  2. 文件读取
  3. 字符串replace方法
  4. 字符串split方法
  5. 字符串lower方法
  6. continue

3. 思路讲解及实现

整体思路如下:

  1. 按行读取文件
  2. 去除每一行的标点符号
  3. 使用split方法切分字符串得到单词
  4. 单词转小写,因为英语单词在句首时第一个字母大写
  5. 将单词放入集合中,利用集合来去重

3.1 去除标点符号

从字符串里去除某个字符可以使用replace方法,它的第一个参数是希望被替换的字符串,第二个参数是替换时所使用的字符串,如果第二个参数是空字符串,那就相当于把第一个参数从原字符串里剔除

line = line.strip().replace(",", "").replace(".", "").replace("!", "")

strip和 replace都会返回处理后的字符串,因此你可以在这两个方法后面接着调用字符串的方法。

3.2 切分字符串

切分字符串使用split方法,你可以像我这样指定以空格来切分,也可以不指定,因为split方法默认就是以空格来切分的

array = line.split(" ")

split方法返回的是列表

3.3 将单词加入到集合中

        for word in array:
            if not word:
                continue

            word_set.add(word.lower())

遍历过程中我对word值做了判断,如果它是空字符串或者None,not word都会返回True,这样的字符串不是单词,不需要统计,再加入到集合之前,先把word转为小写,一些单词例如it, 使用很频繁,如果是句子的第一个单词那么首字母就需要大写,变成It, 他们是同一个单词,都转为小写统计时就不会被识别为两个单词了。

最后输出去重后的单词数量

print(len(word_set))   # 135

4. 完整代码

word_set = set()

with open("./data/world.txt")as f:
    for line in f:
        line = line.strip().replace(",", "").replace(".", "").replace("!", "")
        array = line.split(" ")
        for word in array:
            if not word:
                continue

            word_set.add(word.lower())

print(len(word_set))

扫描关注, 与我技术互动

QQ交流群: 211426309

加入知识星球, 每天收获更多精彩内容

分享日常研究的python技术和遇到的问题及解决方案