简体   繁体   English

如何在 CSV 文件的列之间生成所有可能的单词组合?

[英]How to generate all possible word combinations between columns of a CSV file?

I would like to take a CSV file with each column being a category of words and generate all possible combinations among them.我想获取一个 CSV 文件,其中每一列都是一个单词类别,并在它们之间生成所有可能的组合 Here is the simplified CSV file I am using to test the script (but the CSV that will be used will be larger, with a dozen or more columns):这是我用来测试脚本的简化 CSV 文件(但将使用的 CSV 会更大,有十几列或更多列):

$ cat file.csv 
Color,Pet,Action
black,dog,barks
brown,cat,runs
white,bird,flies
,hamster,
red,,swims

As you can see, some columns have more words (ie there could be more "colors" than "pets", or more "pets" than "actions" for example).如您所见,某些列的单词更多(例如,“colors”可能比“pets”多,或者“pets”比“actions”多)。

Here's what I have so far:这是我到目前为止所拥有的:

import csv
import itertools

with open('file.csv', newline='') as csvfile:
    next(csvfile, None) #skip header row
    data = list(csv.reader(csvfile))

for combination in itertools.product(*data):
    print(combination)

And here's an excerpt of the output I am getting:这是我得到的 output 的摘录:

$ python3 combiner.py 
('black', 'brown', 'white', '', 'red')
('black', 'brown', 'white', '', '')
('black', 'brown', 'white', '', 'swims')
('black', 'brown', 'white', 'hamster', 'red')
('black', 'brown', 'white', 'hamster', '')
('black', 'brown', 'white', 'hamster', 'swims')
('black', 'brown', 'white', '', 'red')
('black', 'brown', 'white', '', '')
('black', 'brown', 'white', '', 'swims')
('black', 'brown', 'bird', '', 'red')
('black', 'brown', 'bird', '', '')
[...]

What I would like to accomplish:我想完成的事情:

  • not have multiple items from the same category (column) in the same output line同一 output 行中没有来自同一类别(列)的多个项目
  • removing parentheses, quotes and commas (I believe I can accomplish that by converting the array to a string before printing)删除括号、引号和逗号(我相信我可以通过在打印之前将数组转换为字符串来实现)

So, to give an example of the output I am trying to get:所以,举一个 output 的例子,我想得到:

black
black dog
black dog barks
black dog runs
black dog flies
black dog swims
black cat
black cat barks
black cat runs
black cat flies
black cat swims
brown
brown dog
brown dog barks
[...]
black hamster
black hamster flies
[...]
red fish runs
[...]

If anyone has a suggestion on the most efficient way to accomplish this (or a specific library or approach to take), I would appreciate it greatly.如果有人对完成此任务的最有效方法(或特定库或采取的方法)提出建议,我将不胜感激。

The trick is to group the columns together before passing them to itertools.product.诀窍是在将列传递给 itertools.product 之前将它们组合在一起。

To print rows like "black" and "black dog" that don't include all of the values of any given iteration, you can store the first iteration as a list, and then compare the values in subsequent iterations, updating the list and printing the values as the values change.要打印不包含任何给定迭代的所有值的“black”和“black dog”之类的行,您可以将第一次迭代存储为列表,然后在后续迭代中比较值,更新列表并打印值随着值的变化而变化。

The solution below generalizes to any number of columns.下面的解决方案可以推广到任意数量的列。

import csv
import itertools

with open("file.csv", "r", newline="", encoding="utf-8") as csvfile:
    reader = csv.reader(csvfile)
    header_row = next(reader)
    columns = [[] for _ in header_row]
    for row in reader:
        for i, value in enumerate(row):
            if value:
                columns[i].append(value)

product_iter = itertools.product(*columns)
current_combination = list(next(product_iter))
for i in range(len(current_combination)):
    print(" ".join(current_combination[:i + 1]))

for combination in product_iter:
    for i in range(len(combination)):
        if combination[i] != current_combination[i]:
            current_combination[i] = combination[i]
            print(" ".join(current_combination[:i + 1]))

Output: Output:

black
black dog
black dog barks
black dog runs
black dog flies
black dog swims
black cat
black cat barks
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在具有许多列的 pandas dataframe 中生成所有可能的列组合? - How to generate all possible combinations of columns in a pandas dataframe with many columns? 在python中生成数组列的所有可能组合 - Generate all possible combinations of array columns in python 数据框列之间所有可能组合的计数 - Count of all possible combinations between dataframe columns 如何在Python中制作单词的所有可能组合 - How to make all possible combinations of a word in Python 如何生成给定长度的所有可能组合 - How to generate all possible combinations of given length 如何在字典中生成字典中所有可能的组合 - How to generate all possible combinations in a dictionary in a dictionary 如何计算所有可能的列组合的总数 - How to calculate totals of all possible combinations of columns 如何创建 pandas 列的所有可能组合? - How to create all possible combinations of pandas columns? 如果此数组中的值可以介于0到255之间,则如何在Python中生成2d数组的所有可能组合 - How to generate all possible combinations of a 2d array in Python, if a value in this array can be between 0 and 255 如何生成字符之间带空格的字符串的所有可能组合? 蟒蛇 - How do I generate all possible combinations of a string with spaces between the characters? Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM