简体   繁体   English

如何合并二进制数据集

[英]How can I combine binary a set of data

I have a set of values vertically 我有一组垂直的值

2,4
4,7
5,8
  9

I want to have binary combination of each two values in vertical for example 2 and 4, 2 and 5 etc . 我想要垂直的每个两个值的二进制组合,例如2和4、2和5等。 the same for the second 第二个一样

2 4
2 5
4 5
4 7
4 8 
 .
 .
 .

Ok it seems very complicated so I try to make it easier I convert my data into horizon 好吧,这似乎非常复杂,所以我尝试使其更容易将数据转换为地平线

I have 2,4,5 4,7,8,9 我有2,4,5 4,7,8,9

I want to have the binary combination of the first row 我想要第一行的二进制组合

2 4
2 5
4 5  

and the the binary combination of the second row 和第二行的二进制组合

4 7
4 8
4 9
7 8
7 9
8 9

I think I understand. 我想我明白。 Try this code: 试试这个代码:

test.py test.py

#!/bin/python

# put items side by side
# take first item and put the next item besides it
# if there are any more items after the next, put that item besides the first item
# if there are no more items after the next, switch to the next item in the list
# repeat
def two_items_side_by_side(mylist):
    list_len = len(mylist)
    for i in range(list_len):
        for j in range(i+1, list_len):
            print '{} {}'.format(mylist[i], mylist[j])

# -------------------------------------------------------------------

# these are two lists
list1 = [2, 4, 5]
list2 = [4, 7, 8, 9]

two_items_side_by_side(list1)
two_items_side_by_side(list2)

When you run this, your results will look like so: 运行此命令时,结果将如下所示:

Result 结果

python test.py
2 4
2 5
4 5
4 7
4 8
4 9
7 8
7 9
8 9

If your test case is a string with each line containing comma separated text like this, you can use test2.py as an example 如果您的测试用例是一个字符串,并且每行包含这样的逗号分隔文本,则可以使用test2.py作为示例。

2,4
4,7
5,8
 ,9

test2.py test2.py

#!/bin/python

# put items side by side
# take first item and put the next item besides it
# if there are any more items after the next, put that item besides the first item
# if there are no more items after the next, switch to the next item in the list
# repeat
def two_items_side_by_side(mylist):
    list_len = len(mylist)
    for i in range(list_len):
        for j in range(i+1, list_len):
            print '{} {}'.format(mylist[i], mylist[j])

# -------------------------------------------------------------------

# process the data and store them into a list
# then do the same work as we did in the first example
def convert_data_into_lists():
    lines = data.split('\n')
    for line in lines:
        # ignore empty lines
        if len(line.strip()) < 1:
            continue

        # split by comma and ignore if we don't get 2 or more values
        items = line.split(',')
        if len(items) < 2:
            continue

        # put first item in list1 and second item in list2
        if len(items[0].strip()) > 0: list1.append(items[0].strip())
        if len(items[1].strip()) > 0: list2.append(items[1].strip())

# -------------------------------------------------------------------

# this is my string
data = """
2,4
4,7
5,8
 ,9
"""

list1 = []
list2 = []

convert_data_into_lists()
two_items_side_by_side(list1)
two_items_side_by_side(list2)

Result 结果

python test2.py
2 4
2 5
4 5
4 7
4 8
4 9
7 8
7 9
8 9

There are more elegant ways to write this code. 有更优雅的方式来编写此代码。 I have written it in a manner that will help you understand the code and try it out yourself. 我以帮助您理解代码并亲自尝试的方式编写了代码。

Requirement change 需求变更

Based on the change in requirement, data is in a text file. 根据需求的变化,数据位于文本文件中。 We will take three test cases (see in results). 我们将采用三个测试用例(请参见结果)。 To accommodate the requirements, I am going to use the same code I used in test2.py . 为了适应需求,我将使用与test2.py相同的代码。 Instead of creating individual lists for each column we have in our text file, I will create one list that will dynamically contain as many lists as you have columns in your text file. 我将创建一个列表,该列表将动态包含与文本文件中的列数一样多的列表,而不是为文本文件中的每一列创建单独的列表。

Code

#!/bin/python

# put items side by side
# take first item and put the next item besides it
# if there are any more items after the next, put that item besides the first item
# if there are no more items after the next, switch to the next item in the list
# repeat
def two_items_side_by_side(mylist):
    list_len = len(mylist)
    for i in range(list_len):
    for j in range(i+1, list_len):
        print '{} {}'.format(mylist[i], mylist[j])

# -------------------------------------------------------------------

# process the data and store them into a list
# then do the same work as we did in the first example
def convert_data_into_lists():

    with open(data) as f:
        lines = f.readlines()

    for line in lines:
        # ignore empty lines
        if len(line.strip()) < 1:
            continue

        # split by comma and ignore if we don't get 2 or more values
        items = line.split(',')

        counter = 0
        for item in items:

            if len(mylist) < counter + 1:
                mylist.append([])
            if len(item.strip()) > 0:
                mylist[counter].append(item.strip())
            counter += 1

# -------------------------------------------------------------------

# this is my string
data = 'test.txt'

mylist = []

convert_data_into_lists()
for individual_list in mylist:
    two_items_side_by_side(individual_list)

Result 结果

Case 1 情况1

Data:
2,4
4,7
5,8
 ,9

Results:
2 4
2 5
4 5
4 7
4 8
4 9
7 8
7 9
8 9

Case 2 情况二

Data:
2,4
4,7
5,8
6,9

Results:
2 4
2 5
2 6
4 5
4 6
5 6
4 7
4 8
4 9
7 8
7 9
8 9

Case 3 情况3

Data:
2,4,10
4,7,11
5,8,
 ,9,13

Results:
2 4
2 5
2 6
4 5
4 6
5 6
4 7
4 8
4 9
7 8
7 9
8 9
10 11
10 13
11 13

If you have values stores in two collectables, use list comprehension 如果您在两个收藏品中有值存储,请使用列表理解

from itertools import izip_longest
a = [(1,'a'),(2,'b'),(3,None)]
b,c = izip_longest(*a)
d = [(i, j) for i in b if i  for j in c if j]

EDIT 编辑

By modifying the code above to only use a single parameter, we can read the contents of a csv file and (using some form of delimiter) provide combinations across the entire data set. 通过修改上面的代码以仅使用单个参数,我们可以读取csv文件的内容,并(使用某种形式的定界符)提供整个数据集的组合。 Just call total_zipper() and replace 'filename.txt' with your file name. 只需调用total_zipper()并将“ filename.txt”替换为您的文件名即可。

def total_zipper():

    def zipper(a):
        lst = []
        for i in range(1,len(a)+1):
            lst+=zip(a,a[i:])
        return sorted(lst)

    f = open('filename.txt','r')
    return [zipper(line) for line in f]

This treats all lines as iterables (Strings). 这会将所有行视为可迭代(字符串)。 For readline() to work, I believe you need a return statement at the end of each line in the txt. 为了使readline()正常工作,我相信您需要在txt中每一行的末尾添加return语句。 See the input/output page for Python for more. 有关更多信息,请参见Python输入/输出页面


Here's the shortest version I could come up with. 这是我能想到的最短的版本。 You can use the built-in zip() function. 您可以使用内置的zip()函数。 This, when combined with list slicing, results in a pythonic way to pair the values in the required order. 当与列表切片结合使用时,将以Python方式以所需顺序将值配对。

def zipper(a,b):
    lst = []
    for i in range(1,len(b)+1):
        lst+=zip(a,b[i:])
    return sorted(lst)

Now simply call zipper on the various rows of data. 现在,只需对各行数据调用zipper。

>>> a = [2,4,5]
>>> b = [4,7,8,9]
>>> print(zipper(a,a))
[(2, 4), (2, 5), (4, 5)]
>>> print(zipper(b,b))
[(4, 7), (4, 8), (4, 9), (7, 8), (7, 9), (8, 9)]

As a side note, I tried to use list comprehension to make the code shorter. 作为附带说明,我尝试使用列表推导来简化代码。 For example, the following code does the same thing as zipper(a) : 例如,以下代码与zipper(a)的作用相同

def zipper(a):
    return list(zip(a,a[i:]) for i in range(1,len(a)+1))

However, with zip() returning generator objects in Python 3, the results aren't as "clean" as the output from the version above. 但是,使用zip()返回Python 3中的生成器对象时,结果不如上述版本的输出“干净”。 I'd have to use next() on each generator object in the list outputted by zipper in order to get the same output, but this is a tedious process. 为了获得相同的输出,我必须在zipper输出的列表中的每个生成器对象上使用next() ,但这是一个繁琐的过程。 Anyone have suggestions for making the list comprehension work? 任何人都有建议使列表理解工作?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM