简体   繁体   English

在Python中存储字符串的好方法是什么?

[英]What is a good way to store strings in Python?

I am building a conversation bot in Python. 我正在用Python构建一个对话机器人。 While I would like to generate as much text as possible from scratch, I still need a way to catalog and store a bunch of dialog fragment strings. 虽然我想从头开始生成尽可能多的文本,但我仍然需要一种方法来编目和存储一堆对话框片段字符串。 Ideally I would like to contain some sort of hierarchy/classifications among the strings. 理想情况下,我想在字符串中包含某种层次结构/分类。 For example: 例如:

Greetings: 问候:

"Oh, nice to meet you {0}" "My name is Bob, how about you?"

Flirtation: 调情:

"Stop it" "I'm blushing" "How flattering"

etc... 等等...

While I could store these in a database, it would be nice to have different format that people could edit easily by hand. 虽然我可以将它们存储在数据库中,但是人们可以轻松地手动编辑不同的格式会很不错。 CSV? CSV? JSON? JSON? Is there any precedent for stuff like this? 这样的东西有先例吗?

That depends on how do you want to use it. 这取决于你想如何使用它。 If the strings are only meant to be used by Python you should consider storing them in their very own .py file. 如果字符串仅供Python使用,则应考虑将它们存储在自己的.py文件中。 Yes, a module, but it is also a simple text file which happens that can be interpreted by Python :) 是的,一个模块,但它也是一个简单的文本文件,可以由Python解释:)

A lot of projects use .py files as configuration files (Django) and importing its contents is very easy since you only have to do import answer_strings and you'll already got them in variables or classes. 许多项目使用.py文件作为配置文件(Django)并导入其内容非常简单,因为您只需要import answer_strings并且您已经在变量或类中获取它们。

You can for example do this: 你可以这样做:

#bot answers module

greetings = ["hello {0}", "what's up {0}"]
farewells = ["see you soon {0}", "nos vemos {0}"]
...

And can return equivalent answers randomly, etc. 并且可以随机返回等效答案等。

On the other side, if these are meant to be also read by Javascript, Java, node.js or whatever technology other than Python then a more universal format should be use, JSON, XML, YAML, you name it. 另一方面,如果这些也应该由Javascript,Java,node.js或除Python以外的任何技术读取,那么应该使用更通用的格式,JSON,XML,YAML,您可以使用它。

I think this is better in a text file (a project's resource) than in a database since that way (as you mentioned it) is more customizable. 我认为这在文本文件(项目的资源)中比在数据库中更好,因为这样(如你所提到的)可以更加自定义。 And I would also recommend to use a format that have semantics included. 我还建议使用包含语义的格式。 A CSV file is IMHO very cold , just a bunch of data dumped to a file. 一个CSV文件是IMHO 非常冷 ,只是一堆数据转储到文件。 With XML, JSON, etc you can group your data in categories like "Greetings", "Farewells", etc, etc. 使用XML,JSON等,您可以按照“问候”,“告别”等类别对数据进行分组。

Not to forget that since you have several options, it would be very good to build your code in a modular way and decoupled. 不要忘记,因为您有多个选项,以模块化方式构建代码并解耦是非常好的。 Thus, if you made a decision and in the future need to change, it would be as seamless as possible to accomplish. 因此,如果您做出决定并且将来需要改变,那么它将尽可能无缝地完成。

Hope this helps! 希望这可以帮助!

It depends on how much information you want to store with the strings. 这取决于您希望用字符串存储多少信息。

I think that for a simple case when the "database" is just list of strings, you could go with plain text, one string per line. 我认为对于一个简单的情况,当“数据库”只是字符串列表时,你可以使用纯文本,每行一个字符串。 Advantage could be that such plain text files are easy to search/edit/manipulate with plethora of tools from GNU coreutils (like grep, sed...) to GUI editors. 优点可能是这样的纯文本文件很容易使用GNU coreutils(如grep,sed ...)到GUI编辑器的大量工具进行搜索/编辑/操作。

This can be also easily extended by using using pre-defined file naming and directory hierarchy. 通过使用预定义的文件命名和目录层次结构,也可以轻松扩展这一点。 For example, structure like 例如,结构就像

data/
data/en_GB/greetings
data/en_GB/farewells
data/en_US/greetings
data/en_US/farewells
data/de_DE/greetings
data/de_DE/farewells

could allow you to pick your data by language, and even deploy only relevant languages on some systems. 可以允许您按语言选择数据,甚至在某些系统上仅部署相关语言。

If only your problem with this would be the newlines, you could still get away by the above plus using some kind of vertical separator like ~~~~ . 如果只有你的问题是换行符,你仍然可以使用某种垂直分隔符(如~~~~来逃避上述情况。


However, if you plan slightly more complicated structures and/or you expect them to change, a full serialization/markup language could make a lot of sense. 但是,如果您计划稍微复杂一些的结构和/或您希望它们发生变化,那么完整的序列化/标记语言可能会很有意义。 One of my favorite is YAML , which is rich, mature, has libraries present for major languages language-agnostic and easily understood and edited by humans (look at their site: it's in YAML!). 我最喜欢的是YAML ,它丰富,成熟,有主要语言库,不受语言限制,人类易于理解和编辑(看看他们的网站:它在YAML!)。

# you can have comments for editors in YAML

# informal greetings are allowed
greetings:
    - "hello {0}"
    - "what's up {0}"

# bye, etc.
farewells:
    - "see you soon {0}"
    - "nos vemos {0}"

# please be polite here
flirtation:
    - "Stop it"
    - "I'm blushing"
    - "How flattering"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中比较大字符串的好方法? - Good way to compare huge strings in Python? 什么是对Python中某些术语进行错误检查的好方法(带或不带Regex)? - What is a good way to error check strings for certain terms in Python (with or without Regex)? 存储映射到字符串的整数以使键可以是python范围的最佳方法是什么? - What is the best way to store integers mapped to strings so that the keys can be ranges in python? 在Python中映射数组的好方法是什么? - What is a good way of mapping arrays in Python? 在 Python 类中排序方法的好方法是什么? - What is a good way to order methods in a Python class? 在python中检查字典键的好方法是什么? - What is good way to check a key of a dictionary in python? 在 Python 中做 countif 的好方法是什么 - What is a good way to do countif in Python 有没有一种好方法可以在 Python 中存储大量类似的刮擦 HTML 文件? - Is there a good way to store large amounts of similar scraped HTML files in Python? 有没有一种好方法可以将布尔数组存储到 python 中的文件或数据库中? - Is there a good way to store a boolean array into a file or database in python? 在哪里可以存储Python脚本的Windows配置文件的好地方/方法? - Where is a good place/way to store Windows config files for Python scripts?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM