简体   繁体   中英

Get the subarray with same numbers and consecutive index

I have a text file like this

   0, 23.00, 78.00, 75.00, 105.00,  2,0.97
   1, 371.00, 305.00, 38.00, 48.00,  0,0.85
   1, 24.00, 78.00, 75.00, 116.00,  2,0.98
   1, 372.00, 306.00, 37.00, 48.00,  0,0.84
   2, 28.00, 87.00, 74.00, 101.00,  2,0.97
   2, 372.00, 307.00, 35.00, 47.00,  0,0.80
   3, 32.00, 86.00, 73.00, 98.00,  2,0.98
   3, 363.00, 310.00, 34.00, 46.00,  0,0.83
   4, 40.00, 77.00, 71.00, 98.00,  2,0.94
   4, 370.00, 307.00, 38.00, 47.00,  0,0.84
   4, 46.00, 78.00, 74.00, 116.00,  2,0.97
   5, 372.00, 308.00, 34.00, 46.00,  0,0.57
   5, 43.00, 66.00, 67.00, 110.00,  2,0.96

Code I tried

frames = []
x = []
y = []
labels = []
with open(file, 'r') as lb:
    for line in lb:
        line = line.replace(',', ' ')
        arr = line.split()
        frames.append(arr[0])
        x.append(arr[1])
        y.append(arr[2])
        labels.append(arr[5])
    print(np.shape(frames))
    for d, a in enumerate(frames):
        compare = []
        if a == frames[d+2]:
            compare.append(x[d])
            compare.append(x[d+1])
            compare.append(x[d+2])
            xm = np.argmin(compare)
            label = {0: int(labels[d]), 1: int(labels[d+1]), 2: int(labels[d+2])}.get(xm)
        elif a == frames[d+1]:
            compare.append(x[d])
            compare.append(x[d+1])
            xm = np.argmin(compare)
            label = {0: int(labels[d]), 1: int(labels[d+1])}.get(xm)

In the first line, because the first number (0) is unique so I extract the sixth number (2) easily. But after that, I got many lines with the same first number, so I want somehow to store all the lines with the same first number to compare the second number, then extract the sixth number of the line which has the lowest second number. Can someone provide python solutions for me? I tried readline() and next() but don't know how to solve it.

you can read the file with pandas.read_csv instead, and things will come much more easily

import pandas as pd
df = pd.read_csv(file_path, header = None)

You'll read the file as a table

    0      1      2     3      4  5     6
0   0   23.0   78.0  75.0  105.0  2  0.97
1   1  371.0  305.0  38.0   48.0  0  0.85
2   1   24.0   78.0  75.0  116.0  2  0.98
3   1  372.0  306.0  37.0   48.0  0  0.84
4   2   28.0   87.0  74.0  101.0  2  0.97
5   2  372.0  307.0  35.0   47.0  0  0.80
6   3   32.0   86.0  73.0   98.0  2  0.98
7   3  363.0  310.0  34.0   46.0  0  0.83
8   4   40.0   77.0  71.0   98.0  2  0.94
9   4  370.0  307.0  38.0   47.0  0  0.84
10  4   46.0   78.0  74.0  116.0  2  0.97
11  5  372.0  308.0  34.0   46.0  0  0.57
12  5   43.0   66.0  67.0  110.0  2  0.96

then you can group in subtables based on one of the columns (in your case column 0)

for group, sub_df in d.groupby(0):
    row = sub_df[1].idxmin() # returns the index of the minimum value for column 1
    df.loc[row, 5] # this is the number you are looking for

I think this is what you need using pandas :

import pandas as pd

df = pd.read_table('./test.txt', sep=',', names = ('1','2','3','4','5','6','7'))
print(df)
#     1      2      3     4      5  6     7
# 0   0   23.0   78.0  75.0  105.0  2  0.97
# 1   1  371.0  305.0  38.0   48.0  0  0.85
# 2   1   24.0   78.0  75.0  116.0  2  0.98
# 3   1  372.0  306.0  37.0   48.0  0  0.84
# 4   2   28.0   87.0  74.0  101.0  2  0.97
# 5   2  372.0  307.0  35.0   47.0  0  0.80
# 6   3   32.0   86.0  73.0   98.0  2  0.98
# 7   3  363.0  310.0  34.0   46.0  0  0.83
# 8   4   40.0   77.0  71.0   98.0  2  0.94
# 9   4  370.0  307.0  38.0   47.0  0  0.84
# 10  4   46.0   78.0  74.0  116.0  2  0.97
# 11  5  372.0  308.0  34.0   46.0  0  0.57
# 12  5   43.0   66.0  67.0  110.0  2  0.96

df_new = df.loc[df.groupby("1")["6"].idxmin()]
print(df_new)
#     1      2      3     4      5  6     7
# 0   0   23.0   78.0  75.0  105.0  2  0.97
# 1   1  371.0  305.0  38.0   48.0  0  0.85
# 5   2  372.0  307.0  35.0   47.0  0  0.80
# 7   3  363.0  310.0  34.0   46.0  0  0.83
# 9   4  370.0  307.0  38.0   47.0  0  0.84
# 11  5  372.0  308.0  34.0   46.0  0  0.57

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM