简体   繁体   中英

python pandas analyze dataframe

I have this dataset (105233 rows x 32 columns matrix), from which I deleted the first column with .drop. At this point what I should do is to analyze each row (an array of 32 components) and look for those the first 16 terms are equal to the last 16.

{
import pandas as pd
import numpy as np

data = pd.read_csv('enummixed.txt', header = None, low_memory=False)
data = data.drop(data.columns[[0]], axis=1)
print data
             1        2  3  4  5  6  7  8  9  10        ...         \
0             1        0  0  0  0  0  0  0  0  0        ...          
1             1        0  0  0  0  0  0  0  0  0        ...          
2             1        0  0  0  0  0  0  0  0  0        ...          
3             1        0  0  0  0  0  0  0  0  0        ...          
4             1        0  0  0  0  0  0  0  0  0        ...          
5             1        0  0  0  0  0  0  0  0  0        ...          
6             1        0  0  0  0  0  0  0  0  0        ...          
7             1        0  0  0  0  0  0  0  0  0        ...          
8       106/243  137/243  0  0  0  0  0  0  0  0        ...          
9       106/243  137/243  0  0  0  0  0  0  0  0        ...          
10      106/243  137/243  0  0  0  0  0  0  0  0        ...          
11      106/243  137/243  0  0  0  0  0  0  0  0        ...          
12      106/243  137/243  0  0  0  0  0  0  0  0        ...          
13      106/243  137/243  0  0  0  0  0  0  0  0        ...          
14      106/243  137/243  0  0  0  0  0  0  0  0        ...          
15      106/243  137/243  0  0  0  0  0  0  0  0        ...          
16      106/243  137/243  0  0  0  0  0  0  0  0        ...          
17      106/243  137/243  0  0  0  0  0  0  0  0        ...          
18      106/243  137/243  0  0  0  0  0  0  0  0        ...          
19      106/243  137/243  0  0  0  0  0  0  0  0        ...          
20      106/243  137/243  0  0  0  0  0  0  0  0        ...          
21      106/243  137/243  0  0  0  0  0  0  0  0        ...          
22      106/243  137/243  0  0  0  0  0  0  0  0        ...          
23      106/243  137/243  0  0  0  0  0  0  0  0        ...          
24      106/243  137/243  0  0  0  0  0  0  0  0        ...          
25      106/243  137/243  0  0  0  0  0  0  0  0        ...          
26      106/243  137/243  0  0  0  0  0  0  0  0        ...          
27      106/243  137/243  0  0  0  0  0  0  0  0        ...          
28      106/243  137/243  0  0  0  0  0  0  0  0        ...          
29      106/243  137/243  0  0  0  0  0  0  0  0        ...          
...         ...      ... .. .. .. .. .. .. .. ..        ...          
105203        0        0  0  0  0  0  0  0  0  0        ...          
105204        0        0  0  0  0  0  0  0  0  0        ...          
105205        0        0  0  0  0  0  0  0  0  0        ...          
105206        0        0  0  0  0  0  0  0  0  0        ...          
105207        0        0  0  0  0  0  0  0  0  0        ...          
105208        0        0  0  0  0  0  0  0  0  0        ...          
105209        0        0  0  0  0  0  0  0  0  0        ...          
105210        0        0  0  0  0  0  0  0  0  0        ...          
105211        0        0  0  0  0  0  0  0  0  0        ...          
105212        0        0  0  0  0  0  0  0  0  0        ...          
105213        0        0  0  0  0  0  0  0  0  0        ...          
105214        0        0  0  0  0  0  0  0  0  0        ...          
105215        0        0  0  0  0  0  0  0  0  0        ...          
105216        0        0  0  0  0  0  0  0  0  0        ...          
105217        0        0  0  0  0  0  0  0  0  0        ...          
105218        0        0  0  0  0  0  0  0  0  0        ...          
105219        0        0  0  0  0  0  0  0  0  0        ...          
105220        0        0  0  0  0  0  0  0  0  0        ...          
105221        0        0  0  0  0  0  0  0  0  0        ...          
105222        0        0  0  0  0  0  0  0  0  0        ...          
105223        0        0  0  0  0  0  0  0  0  0        ...          
105224        0        0  0  0  0  0  0  0  0  0        ...          
105225        0        0  0  0  0  0  0  0  0  0        ...          
105226        0        0  0  0  0  0  0  0  0  0        ...          
105227        0        0  0  0  0  0  0  0  0  0        ...          
105228        0        0  0  0  0  0  0  0  0  0        ...          
105229        0        0  0  0  0  0  0  0  0  0        ...          
105230        0        0  0  0  0  0  0  0  0  0        ...          
105231        0        0  0  0  0  0  0  0  0  0        ...          
105232        0        0  0  0  0  0  0  0  0  0        ...          

                      23                24                25 26  \
0                      0                 0                 0  0   
1                395/543                 0                 0  0   
2                      0                 0                 0  0   
3           29449/110942                 0                 0  0   
4                      0                 0                 0  0   
5            41459/81005                 0                 0  0   
6                      0                 0                 0  0   
7       4133206/15626431                 0                 0  0   
8                      0                 0                 0  0   
9                      0                 0                 0  0   
10           41459/81005                 0                 0  0   
11      6359221/17955721                 0                 0  0   
12                     0                 0       41459/81005  0   
13                     0                 0  6359221/17955721  0   
14                     0                 0                 0  0   
15      4133206/15626431                 0                 0  0   
16                     0                 0  4133206/15626431  0   
17                     0                 0                 0  0   
18                     0                 0                 0  0   
19           41459/81005                 0                 0  0   
20      6359221/17955721                 0                 0  0   
21                     0                 0       41459/81005  0   
22                     0                 0  6359221/17955721  0   
23                     0                 0                 0  0   
24      4133206/15626431                 0                 0  0   
25                     0                 0  4133206/15626431  0   
26                     0                 0                 0  0   
27                     0                 0                 0  0   
28           41459/81005                 0                 0  0   
29      6359221/17955721                 0                 0  0   
...                  ...               ...               ... ..   
105203                 0       41459/81005                 0  0   
105204                 0  6359221/17955721                 0  0   
105205                 0  6359221/17955721                 0  0   
105206                 0                 0       41459/81005  0   
105207                 0                 0  6359221/17955721  0   
105208                 0                 0  6359221/17955721  0   
105209                 0           395/543                 0  0   
105210                 0       23702/64201                 0  0   
105211                 0       23702/64201                 0  0   
105212                 0                 0           395/543  0   
105213                 0                 0       23702/64201  0   
105214                 0                 0       23702/64201  0   
105215                 0       41459/81005                 0  0   
105216                 0  6359221/17955721                 0  0   
105217                 0  6359221/17955721                 0  0   
105218                 0                 0       41459/81005  0   
105219                 0                 0  6359221/17955721  0   
105220                 0                 0  6359221/17955721  0   
105221                 0       41459/81005                 0  0   
105222                 0  6359221/17955721                 0  0   
105223                 0  6359221/17955721                 0  0   
105224                 0                 0       41459/81005  0   
105225                 0                 0  6359221/17955721  0   
105226                 0                 0  6359221/17955721  0   
105227                 0           395/543                 0  0   
105228                 0       23702/64201                 0  0   
105229                 0       23702/64201                 0  0   
105230                 0                 0           395/543  0   
105231                 0                 0       23702/64201  0   
105232                 0                 0       23702/64201  0   

                      27                28 29 30                31  \
0                      0                 0  0  0                 0   
1                      0                 0  0  0                 0   
2                      0             57/74  0  0                 0   
3                      0      63397/110942  0  0                 0   
4                      0                 0  0  0                 0   
5                      0                 0  0  0                 0   
6                      0       49467/72995  0  0                 0   
7                      0  7658739/15626431  0  0                 0   
8                      0                 0  0  0                 0   
9                      0                 0  0  0                 0   
10                     0                 0  0  0                 0   
11                     0                 0  0  0                 0   
12                     0                 0  0  0                 0   
13                     0                 0  0  0                 0   
14                     0       49467/72995  0  0                 0   
15                     0  7658739/15626431  0  0                 0   
16                     0  7658739/15626431  0  0                 0   
17                     0                 0  0  0                 0   
18                     0                 0  0  0                 0   
19                     0                 0  0  0                 0   
20                     0                 0  0  0                 0   
21                     0                 0  0  0                 0   
22                     0                 0  0  0                 0   
23                     0       49467/72995  0  0                 0   
24                     0  7658739/15626431  0  0                 0   
25                     0  7658739/15626431  0  0                 0   
26                     0                 0  0  0           106/243   
27                     0                 0  0  0       16031/72995   
28                     0                 0  0  0        3143/16201   
29                     0                 0  0  0  2375174/17955721   
...                  ...               ... .. ..               ...   
105203        3143/16201                 0  0  0                 0   
105204  2375174/17955721                 0  0  0                 0   
105205  2375174/17955721                 0  0  0                 0   
105206        3143/16201                 0  0  0                 0   
105207  2375174/17955721                 0  0  0                 0   
105208  2375174/17955721                 0  0  0                 0   
105209           148/543                 0  0  0                 0   
105210      17601/128402                 0  0  0                 0   
105211      17601/128402                 0  0  0                 0   
105212           148/543                 0  0  0                 0   
105213      17601/128402                 0  0  0                 0   
105214      17601/128402                 0  0  0                 0   
105215                 0                 0  0  0        3143/16201   
105216                 0                 0  0  0  2375174/17955721   
105217                 0                 0  0  0  2375174/17955721   
105218                 0                 0  0  0        3143/16201   
105219                 0                 0  0  0  2375174/17955721   
105220                 0                 0  0  0  2375174/17955721   
105221                 0                 0  0  0        3143/16201   
105222                 0                 0  0  0  2375174/17955721   
105223                 0                 0  0  0  2375174/17955721   
105224                 0                 0  0  0        3143/16201   
105225                 0                 0  0  0  2375174/17955721   
105226                 0                 0  0  0  2375174/17955721   
105227                 0                 0  0  0           148/543   
105228                 0                 0  0  0      17601/128402   
105229                 0                 0  0  0      17601/128402   
105230                 0                 0  0  0           148/543   
105231                 0                 0  0  0      17601/128402   
105232                 0                 0  0  0      17601/128402   

                      32  
0                      0  
1                      0  
2                      0  
3                      0  
4                137/243  
5            23831/81005  
6             7497/72995  
7       1421917/15626431  
8                      0  
9                      0  
10                     0  
11                     0  
12                     0  
13                     0  
14                     0  
15                     0  
16                     0  
17               137/243  
18            7497/72995  
19           23831/81005  
20      1562587/17955721  
21           23831/81005  
22      1562587/17955721  
23            7497/72995  
24      1421917/15626431  
25      1421917/15626431  
26                     0  
27                     0  
28                     0  
29                     0  
...                  ...  
105203                 0  
105204                 0  
105205                 0  
105206                 0  
105207                 0  
105208                 0  
105209                 0  
105210                 0  
105211                 0  
105212                 0  
105213                 0  
105214                 0  
105215                 0  
105216                 0  
105217                 0  
105218                 0  
105219                 0  
105220                 0  
105221                 0  
105222                 0  
105223                 0  
105224                 0  
105225                 0  
105226                 0  
105227                 0  
105228                 0  
105229                 0  
105230                 0  
105231                 0  
105232                 0  

[105233 rows x 32 columns]
} 

Unfortunately I am not very practical and I ask for help. Best, Nicolò

I am sure that there is a simpler, shorter, more elegant and more pythonic way to solve this, but in the while here there is a solution. It returns the df with the rows in which the first 16 terms are the same as the second 16. Here an example with few rows and columns:

df = pd.DataFrame({'a':[4,2,4,5,5,4],
                'b':[4,3,1,2,2,4],
                'c':[1,2,4,5,5,3],
                'd': [4, 3, 2, 2, 2, 4],})
print df
   a  b  c  d
0  4  4  1  4
1  2  3  2  3
2  4  1  4  2
3  5  2  5  2
4  5  2  5  2
5  4  4  3  4

df_a = df.iloc[:,:2]
df_b = df.iloc[:,2:]
df_b.columns = df_a.columns
c = df_b-df_a
c = c.applymap(lambda x: True if x!=0 else False)
df_a = df_a.mask(c)
a = pd.isnull(df_a).any(1).nonzero()[0]
df = df.drop(df.index[a])

Output:

   a  b  c  d
1  2  3  2  3
3  5  2  5  2
4  5  2  5  2

In your case:

df_a = df.iloc[:,:16]
df_b = df.iloc[:,16:]

thanks for the answers. For one reason or another they both did not work, but they were useful. I found a solution, ancient, not very elegant, but working:

import pandas as pd
data = pd.read_csv('enummixed.txt', header = None, low_memory=False)
data = data.drop(data.columns[[0]],axis=1)
for i in data.index:
    k=0
    for j in range(0,15):
        if (data.iloc[i,j]==data.iloc[i,j+16]) is True:
            k+=1
            if k==15:
                print(data.loc[i], file=open("symmetric_ne.txt", "a"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM