[英]How to generate all possible word combinations between columns of a CSV file?
I would like to take a CSV file with each column being a category of words and generate all possible combinations among them.我想获取一个 CSV 文件,其中每一列都是一个单词类别,并在它们之间生成所有可能的组合。 Here is the simplified CSV file I am using to test the script (but the CSV that will be used will be larger, with a dozen or more columns):这是我用来测试脚本的简化 CSV 文件(但将使用的 CSV 会更大,有十几列或更多列):
$ cat file.csv
Color,Pet,Action
black,dog,barks
brown,cat,runs
white,bird,flies
,hamster,
red,,swims
As you can see, some columns have more words (ie there could be more "colors" than "pets", or more "pets" than "actions" for example).如您所见,某些列的单词更多(例如,“colors”可能比“pets”多,或者“pets”比“actions”多)。
Here's what I have so far:这是我到目前为止所拥有的:
import csv
import itertools
with open('file.csv', newline='') as csvfile:
next(csvfile, None) #skip header row
data = list(csv.reader(csvfile))
for combination in itertools.product(*data):
print(combination)
And here's an excerpt of the output I am getting:这是我得到的 output 的摘录:
$ python3 combiner.py
('black', 'brown', 'white', '', 'red')
('black', 'brown', 'white', '', '')
('black', 'brown', 'white', '', 'swims')
('black', 'brown', 'white', 'hamster', 'red')
('black', 'brown', 'white', 'hamster', '')
('black', 'brown', 'white', 'hamster', 'swims')
('black', 'brown', 'white', '', 'red')
('black', 'brown', 'white', '', '')
('black', 'brown', 'white', '', 'swims')
('black', 'brown', 'bird', '', 'red')
('black', 'brown', 'bird', '', '')
[...]
What I would like to accomplish:我想完成的事情:
So, to give an example of the output I am trying to get:所以,举一个 output 的例子,我想得到:
black
black dog
black dog barks
black dog runs
black dog flies
black dog swims
black cat
black cat barks
black cat runs
black cat flies
black cat swims
brown
brown dog
brown dog barks
[...]
black hamster
black hamster flies
[...]
red fish runs
[...]
If anyone has a suggestion on the most efficient way to accomplish this (or a specific library or approach to take), I would appreciate it greatly.如果有人对完成此任务的最有效方法(或特定库或采取的方法)提出建议,我将不胜感激。
The trick is to group the columns together before passing them to itertools.product.诀窍是在将列传递给 itertools.product 之前将它们组合在一起。
To print rows like "black" and "black dog" that don't include all of the values of any given iteration, you can store the first iteration as a list, and then compare the values in subsequent iterations, updating the list and printing the values as the values change.要打印不包含任何给定迭代的所有值的“black”和“black dog”之类的行,您可以将第一次迭代存储为列表,然后在后续迭代中比较值,更新列表并打印值随着值的变化而变化。
The solution below generalizes to any number of columns.下面的解决方案可以推广到任意数量的列。
import csv
import itertools
with open("file.csv", "r", newline="", encoding="utf-8") as csvfile:
reader = csv.reader(csvfile)
header_row = next(reader)
columns = [[] for _ in header_row]
for row in reader:
for i, value in enumerate(row):
if value:
columns[i].append(value)
product_iter = itertools.product(*columns)
current_combination = list(next(product_iter))
for i in range(len(current_combination)):
print(" ".join(current_combination[:i + 1]))
for combination in product_iter:
for i in range(len(combination)):
if combination[i] != current_combination[i]:
current_combination[i] = combination[i]
print(" ".join(current_combination[:i + 1]))
Output: Output:
black
black dog
black dog barks
black dog runs
black dog flies
black dog swims
black cat
black cat barks
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.