使用 Python 查找序列

Question

我有一个问题陈述，我正在寻找一些指导。 我有一张像下面这样的桌子- 源数据集

现在对于每个名字，我们都有一个依赖。 对于名称中的某些项目，没有依赖关系，对于某些我们可以看到它们依赖于名称列中的 2 或 3 个项目。 我想要一个目标数据集，其中应该有另一个名为序列的列，并且序列的值应该以这种方式派生 - 如果名称中的值没有依赖关系 - 序列应该是 1 如果特定项目有 1 依赖关系在名称中并且该依赖值进一步没有任何其他依赖，那么序列的值应该是 2 同样，如果我们在名称中有一个项目，即具有 2 个依赖项，例如国家有城市和地址，然后城市进一步依赖pincode 进一步没有任何依赖关系，因此序列的值应该是 3 等等。 Hete 是我希望目标数据集看起来像的样子-

Boris 的输入数据集：在此处输入图像描述

Answer 1

您可以使用 CSV 库并找到使用行和列循环计算数据的计数

import csv

with open('testdata1.csv', 'r') as csvfile:
 csvreader = csv.reader(csvfile)
 next(csvreader) #skip the first row
 for row in csvreader:
   i = 0
   for col in row:            
     if col in (None, ""):
        continue     
     if col.find(',') != -1:
        i = 1 + len(col.split(","))
     else:
        i = i + 1
   print(i)

Answer 2

使用pandas解决方案如下所示：

import pandas as pd

data = pd.read_excel(r'D:\Desktop\data.xlsx')

sequence = []
for i in range(len(data['Name'])):
    # Here we store heads of 
    # the chains we are currently on
    # [<name>, <length>]
    deps_chains = [[data['Name'][i], 1]]

    # Currently maximal length
    # of the dependency chain
    max_dep_length = 1

    # Whether there are dependencies
    # to proceed in the next iteration
    is_longer = True

    while is_longer:
        # Here are the heads we will
        # consider in the next iteration
        next_deps_chain = []

        for dep, length in deps_chains:
            dep_idx = data[data['Name'] == dep].index[0]

            # Dependencies of the current dependency
            dependencies = data['Dependency'][dep_idx]

            if pd.isnull(dependencies):
                # If the current dependency 
                # have no dependencies of 
                # its own, then find out
                # whether length of the chain
                # is the maximal
                max_dep_length = max(max_dep_length, length)
            else:
                # Dependencies of the current
                # dependency will be considered
                # in the next iteration
                next_deps_chain += [
                    [d, length + 1] for d in dependencies.split(',')
                ]

        # Change for the next iteration
        deps_chains = next_deps_chain

        # Whether there are dependencies
        # for the next iteration
        is_longer = len(next_deps_chain) > 0

    # We found the longest dependency chain
    sequence.append(max_dep_length)

# Here we set the column 'sequence' 
# to our result
data['sequence'] = sequence

data.to_excel(r'D:\Desktop\data.xlsx', index=False)

Answer 3

由于缺乏细节，其中一些必须是伪代码。

与其他答案不同，我相信 OP 正在询问如何计算序列 # 给定依赖项和名称。

一种方法是使用递归调用，通过先前计算的序列的字典来提高效率。 一般的想法是，如果依赖项为空，则序列号为 1，否则为依赖项的最大序列号加 1。如果您愿意，您甚至可以在 excel 中实现这一点。

class DepSeqTable:
    def __init__(self, datasource):
        self.seqlookup = dict()
        self.deplookup = dict()
        #for loop over each data line in datasource:
            #name = text from name column of datasource
            #parse the dependency column of datasource into a list called listOfDeps
            self.deplookup.update(name,listOfDeps)
        for name in self.deplookup:
            self.SeqOf(name)
    def SeqOf(self, name):
        if self.seqlookup.get(name) != None:
            return self.seqlookup.get(name)
        deps = self.deplookup.get(name)
        if deps == None:
            #raise error that name was not defined in table
            #return appropriate value (1 or maybe negative?)
        if len(deps) == 0:
            self.seqlookup.update(name, 1)
            return 1
        maxDepSeq = 0
        for dep in deps:
            depseq = self.SeqOf(dep)
            if depseq > maxDepSeq:
                maxDepSeq = depseq
        self.seqlookup.update(name, maxDepSeq +1)
        return maxDepSeq + 1

用法是：

table = DepSeqTable(datasource)
#draw whatever info you want out of table

您可能需要添加更多“get”类型的函数来访问 DepSeqTable 中的数据，具体取决于您的需要。 此外，如果您只想按需评估序列，您可能希望删除 init 中的第二个 for 循环。

使用 Python 查找序列

问题描述

3 个解决方案

解决方案1
0 2020-04-17 10:32:46

解决方案2
0 已采纳 2020-04-17 11:06:25

解决方案3
0 2020-04-17 11:19:59

使用 Python 查找序列

问题描述

3 个解决方案

解决方案1 0 2020-04-17 10:32:46

解决方案2 0 已采纳 2020-04-17 11:06:25

解决方案3 0 2020-04-17 11:19:59

解决方案1
0 2020-04-17 10:32:46

解决方案2
0 已采纳 2020-04-17 11:06:25

解决方案3
0 2020-04-17 11:19:59