简体   繁体   English

将交叉表转换为列,而无需在python中使用熊猫

[英]convert crosstab to columns without using pandas in python

How do I convert the crosstab data from the input file mentioned below into columns based on the input list without using pandas? 如何在不使用熊猫的情况下根据输入列表将以下提到的输入文件中的交叉表数据转换为列?

Input list 输入清单

[A,B,C]

Input data file 输入数据文件

Labels A,B,C are only for representation, original file only has the numeric values. 标签A,B,C仅用于表示,原始文件仅具有数字值。 We can ignore the colums XX & YY based on the length of the input list 我们可以根据输入列表的长度忽略列XX和YY

  A B C XX YY
A 0 2 3 4  8
B 4 0 6 4  8
C 7 8 0 5  8

Output (Output needs to have labels) 输出 (输出需要有标签)

A A 0
A B 2
A C 3
B A 4
B B 0
B C 6
C A 7
C B 8
C C 0

The labels need to be present in the output file even though its present in the input file, hence I have mentioned its representation in the output file. 即使标签存在于输入文件中,标签也必须存在于输出文件中,因此我在输出文件中提到了标签的表示形式。

NB: In reality the labels are sorted city names without duplicates in ascending order & not single alphabets like A or B. 注意:实际上,标签是按城市名称排序的,没有升序重复,也没有单个字母,如A或B。

Unfortunately this would have been easier if I could install pandas on the server & use unstack(), but installations aren't allowed on this old server right now. 不幸的是,如果我可以在服务器上安装熊猫并使用unstack()的话,这会更容易,但是现在不允许在此旧服务器上进行安装。 This is on python 3.5 这是在python 3.5上

Considering you tagged the post csv , I'm assuming the actual input data is a .csv file, without header as you indicated. 考虑到您标记了csv ,我假设实际的输入数据是一个.csv文件,没有您指定的标题。

So example data would look like: 因此示例数据如下所示:

0,2,3,4,8
4,0,6,4,8
7,8,0,5,8

If the labels are provided as a list, matching the order of the columns and rows (ie ['A', 'B', 'C'] this would turn the example output into: 如果标签以列表形式提供,并且与列和行的顺序匹配(即['A', 'B', 'C']则示例输出将变为:

'A','A',0
'A','B',2
'A','C',3
'B','A',4
etc.

Note that this implies the number of rows and columns in the file cannot exceed the number of labels provided. 请注意,这意味着文件中的行和列数不能超过提供的标签数。

You indicate that the columns you label 'XX' and 'YY' are to be ignored, but you don't indicate how that's supposed to be communicated, but you do mention the length of the input is determining it, so I assume this means 'everything after column n can be ignored'. 您指出标记为“ XX”和“ YY”的列将被忽略,但没有指出应如何进行通信,但是您确实提到输入的长度决定了它,因此我认为这意味着“列n之后的所有内容都可以忽略”。

This is a simple implementation: 这是一个简单的实现:

from csv import reader


def unstack_csv(fn, columns, labels):
    with open(fn) as f:
        cr = reader(f)
        row = 0
        for line in cr:
            col = 0
            for x in line[:columns]:
                yield labels[row], labels[col], x
                col += 1
            row += 1


print(list(unstack_csv('unstack.csv', 3, ['A', 'B', 'C'])))

or if you like it short and sweet: 或者,如果您喜欢它又短又甜:

from csv import reader

with open('unstack.csv') as f:
    content = reader(f)
    labels = ['A', 'B', 'C']
    print([(labels[row], labels[col], x)
           for row, data in enumerate(content)
           for col, x in enumerate(data) if col < 3])

(I'm also assuming using numpy is out, for the same reason as pandas, but that stuff like csv is in, since it's a standard library) (出于与熊猫相同的原因,我还假设使用了numpy,但由于它是标准库,所以包含了诸如csv东西)

If you don't want to provide the labels explicitly, but just want them generated, you could do something like: 如果您不想显式提供标签,而只希望生成标签,则可以执行以下操作:

def label(n):
    r = n // 26
    c = chr(65 + (n % 26))
    if r > 0:
        return label(r-1)+c
    else:
        return c

And then of course just remove the labels from the examples and replace with calls to label(col) and label(row) . 然后当然只需从示例中删除labels ,并替换为对label(col)label(row)调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM