简体   繁体   English

CSV文件中的Python分析列和单元格中的数据

[英]Python from csv file analyze data in columns and cells

I am trying to create a code for the following data: 我正在尝试为以下数据创建代码:

碳A和碳B的列表

I have imported the data using the code: 我已经使用代码导入了数据:

import csv
import itertools
import pandas as pd

input_file="computation.csv"
cmd=pd.read_csv(input_file)
subset = cmd[['Carbon A', 'Carbon B']]
carbon_pairs = [tuple(y) for y in subset.values]
c_pairs = carbon_pairs

I want to create a code that has the output: 我想创建一个具有输出的代码:

1 is connected to
  2
  4
  6
  7 
  8
2 is connected to
  1
  4
  5

Note that for 'carbon' 2, I would like it to repeat that it is connected to carbon 1. I was thinking that some permutation would be able to show this, but I am very unsure where to start. 请注意,对于“碳素2”,我想重复一遍,它与碳素1连接。我本来以为可以通过一些排列来证明这一点,但是我不确定从哪里开始。 Basically, the code needs to output: 基本上,代码需要输出:

for every cell with the same value, print adjacent cell

You can get your desired output without the pandas dependency with the following function (Python 2), which will allow you to pass in any filename you want, and control with indices (zero-based) you're trying to query. 您可以使用以下函数(Python 2)获得不依赖于pandas的所需输出,该函数将允许您传入所需的任何文件名,并使用要查询的索引(从零开始)进行控制。 This solution assumes that the data is sorted as in the example you provided. 该解决方案假定按照您提供的示例对数据进行排序。

import csv

def printAdjacentNums(filename, firstIdx, secondIdx):
    with open(filename, 'rb') as csvfile:
        # handle header line
        header = next(csvfile)
        reader = csv.reader(csvfile)
        current_val = ''
        current_adj = []
        # dict of lists for lookback
        lookback = {}
        for row in reader:
            if current_val == '':
                current_val = row[firstIdx]
            if row[firstIdx] == current_val:
                current_adj.append(row[secondIdx])
            else:
                # check lookback
                for k, v in lookback.items():
                    if current_val in v:
                        current_adj.append(k)

                # print what we need to
                print current_val + ' is connected to'
                for i in current_adj:
                    print i

                # append current vals to lookback
                lookback[current_val] = current_adj

                # reassign
                current_val = row[firstIdx]
                current_adj = [row[secondIdx]]

     # print final set
    for k, v in lookback.items():
        if current_val in v:
            current_adj.append(k)
    print current_val + ' is connected to'
    for i in current_adj:
        print i

Then call it like so, based on your example: 然后根据您的示例按如下方式调用它:

printAdjacentNums('computation.csv', 0, 1)

Starting from the end of your question: 从问题的结尾开始:

c_pairs = [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)]

You presumably want to end up with something more like: 您可能想以类似以下的形式结束:

groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]}

There are many ways to obtain this. 有很多方法可以做到这一点。

A very fast way, if you know your data is sorted, is to use itertools.groupby , eg: 如果您知道数据已排序,则一种非常快速的方法是使用itertools.groupby ,例如:

first_item = lambda (a, b): a
for key, items in itertools.groupby(c_pairs, first_item):
    print '%s is connected to' % key
    for (a, b) in items:
        print '  %s' % b

it is still probably the fastest way if your data is not sorted, simply sort it first: 如果您不对数据进行排序,它仍然可能是最快的方法,只需先对其进行排序:

c_pairs = sorted(c_pairs, key=first_item)

A more do-it-yourself solution is to use defaultdict or a standard dictionary to create a mapping from one to the other. 自己动手做的另一种解决方案是使用defaultdict或标准字典创建一个到另一个的映射。

groups = collections.defaultdict(list)
for a, b in c_pairs:
    groups[a].append(b)

which is equivalent to without collections: 这相当于没有集合:

groups = {}
for a, b in c_pairs:
    groups.setdefault(a, [])  # many ways to do this as well
    groups[a].append(b)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM