繁体   English   中英

python pandas分析数据框

[英]python pandas analyze dataframe

我有这个数据集(105233行x 32列矩阵),从中删除了带有.drop的第一列。 在这一点上,我应该做的是分析每一行(一个由32个组件组成的数组),并查找前16个项等于后16个项。

{
import pandas as pd
import numpy as np

data = pd.read_csv('enummixed.txt', header = None, low_memory=False)
data = data.drop(data.columns[[0]], axis=1)
print data
             1        2  3  4  5  6  7  8  9  10        ...         \
0             1        0  0  0  0  0  0  0  0  0        ...          
1             1        0  0  0  0  0  0  0  0  0        ...          
2             1        0  0  0  0  0  0  0  0  0        ...          
3             1        0  0  0  0  0  0  0  0  0        ...          
4             1        0  0  0  0  0  0  0  0  0        ...          
5             1        0  0  0  0  0  0  0  0  0        ...          
6             1        0  0  0  0  0  0  0  0  0        ...          
7             1        0  0  0  0  0  0  0  0  0        ...          
8       106/243  137/243  0  0  0  0  0  0  0  0        ...          
9       106/243  137/243  0  0  0  0  0  0  0  0        ...          
10      106/243  137/243  0  0  0  0  0  0  0  0        ...          
11      106/243  137/243  0  0  0  0  0  0  0  0        ...          
12      106/243  137/243  0  0  0  0  0  0  0  0        ...          
13      106/243  137/243  0  0  0  0  0  0  0  0        ...          
14      106/243  137/243  0  0  0  0  0  0  0  0        ...          
15      106/243  137/243  0  0  0  0  0  0  0  0        ...          
16      106/243  137/243  0  0  0  0  0  0  0  0        ...          
17      106/243  137/243  0  0  0  0  0  0  0  0        ...          
18      106/243  137/243  0  0  0  0  0  0  0  0        ...          
19      106/243  137/243  0  0  0  0  0  0  0  0        ...          
20      106/243  137/243  0  0  0  0  0  0  0  0        ...          
21      106/243  137/243  0  0  0  0  0  0  0  0        ...          
22      106/243  137/243  0  0  0  0  0  0  0  0        ...          
23      106/243  137/243  0  0  0  0  0  0  0  0        ...          
24      106/243  137/243  0  0  0  0  0  0  0  0        ...          
25      106/243  137/243  0  0  0  0  0  0  0  0        ...          
26      106/243  137/243  0  0  0  0  0  0  0  0        ...          
27      106/243  137/243  0  0  0  0  0  0  0  0        ...          
28      106/243  137/243  0  0  0  0  0  0  0  0        ...          
29      106/243  137/243  0  0  0  0  0  0  0  0        ...          
...         ...      ... .. .. .. .. .. .. .. ..        ...          
105203        0        0  0  0  0  0  0  0  0  0        ...          
105204        0        0  0  0  0  0  0  0  0  0        ...          
105205        0        0  0  0  0  0  0  0  0  0        ...          
105206        0        0  0  0  0  0  0  0  0  0        ...          
105207        0        0  0  0  0  0  0  0  0  0        ...          
105208        0        0  0  0  0  0  0  0  0  0        ...          
105209        0        0  0  0  0  0  0  0  0  0        ...          
105210        0        0  0  0  0  0  0  0  0  0        ...          
105211        0        0  0  0  0  0  0  0  0  0        ...          
105212        0        0  0  0  0  0  0  0  0  0        ...          
105213        0        0  0  0  0  0  0  0  0  0        ...          
105214        0        0  0  0  0  0  0  0  0  0        ...          
105215        0        0  0  0  0  0  0  0  0  0        ...          
105216        0        0  0  0  0  0  0  0  0  0        ...          
105217        0        0  0  0  0  0  0  0  0  0        ...          
105218        0        0  0  0  0  0  0  0  0  0        ...          
105219        0        0  0  0  0  0  0  0  0  0        ...          
105220        0        0  0  0  0  0  0  0  0  0        ...          
105221        0        0  0  0  0  0  0  0  0  0        ...          
105222        0        0  0  0  0  0  0  0  0  0        ...          
105223        0        0  0  0  0  0  0  0  0  0        ...          
105224        0        0  0  0  0  0  0  0  0  0        ...          
105225        0        0  0  0  0  0  0  0  0  0        ...          
105226        0        0  0  0  0  0  0  0  0  0        ...          
105227        0        0  0  0  0  0  0  0  0  0        ...          
105228        0        0  0  0  0  0  0  0  0  0        ...          
105229        0        0  0  0  0  0  0  0  0  0        ...          
105230        0        0  0  0  0  0  0  0  0  0        ...          
105231        0        0  0  0  0  0  0  0  0  0        ...          
105232        0        0  0  0  0  0  0  0  0  0        ...          

                      23                24                25 26  \
0                      0                 0                 0  0   
1                395/543                 0                 0  0   
2                      0                 0                 0  0   
3           29449/110942                 0                 0  0   
4                      0                 0                 0  0   
5            41459/81005                 0                 0  0   
6                      0                 0                 0  0   
7       4133206/15626431                 0                 0  0   
8                      0                 0                 0  0   
9                      0                 0                 0  0   
10           41459/81005                 0                 0  0   
11      6359221/17955721                 0                 0  0   
12                     0                 0       41459/81005  0   
13                     0                 0  6359221/17955721  0   
14                     0                 0                 0  0   
15      4133206/15626431                 0                 0  0   
16                     0                 0  4133206/15626431  0   
17                     0                 0                 0  0   
18                     0                 0                 0  0   
19           41459/81005                 0                 0  0   
20      6359221/17955721                 0                 0  0   
21                     0                 0       41459/81005  0   
22                     0                 0  6359221/17955721  0   
23                     0                 0                 0  0   
24      4133206/15626431                 0                 0  0   
25                     0                 0  4133206/15626431  0   
26                     0                 0                 0  0   
27                     0                 0                 0  0   
28           41459/81005                 0                 0  0   
29      6359221/17955721                 0                 0  0   
...                  ...               ...               ... ..   
105203                 0       41459/81005                 0  0   
105204                 0  6359221/17955721                 0  0   
105205                 0  6359221/17955721                 0  0   
105206                 0                 0       41459/81005  0   
105207                 0                 0  6359221/17955721  0   
105208                 0                 0  6359221/17955721  0   
105209                 0           395/543                 0  0   
105210                 0       23702/64201                 0  0   
105211                 0       23702/64201                 0  0   
105212                 0                 0           395/543  0   
105213                 0                 0       23702/64201  0   
105214                 0                 0       23702/64201  0   
105215                 0       41459/81005                 0  0   
105216                 0  6359221/17955721                 0  0   
105217                 0  6359221/17955721                 0  0   
105218                 0                 0       41459/81005  0   
105219                 0                 0  6359221/17955721  0   
105220                 0                 0  6359221/17955721  0   
105221                 0       41459/81005                 0  0   
105222                 0  6359221/17955721                 0  0   
105223                 0  6359221/17955721                 0  0   
105224                 0                 0       41459/81005  0   
105225                 0                 0  6359221/17955721  0   
105226                 0                 0  6359221/17955721  0   
105227                 0           395/543                 0  0   
105228                 0       23702/64201                 0  0   
105229                 0       23702/64201                 0  0   
105230                 0                 0           395/543  0   
105231                 0                 0       23702/64201  0   
105232                 0                 0       23702/64201  0   

                      27                28 29 30                31  \
0                      0                 0  0  0                 0   
1                      0                 0  0  0                 0   
2                      0             57/74  0  0                 0   
3                      0      63397/110942  0  0                 0   
4                      0                 0  0  0                 0   
5                      0                 0  0  0                 0   
6                      0       49467/72995  0  0                 0   
7                      0  7658739/15626431  0  0                 0   
8                      0                 0  0  0                 0   
9                      0                 0  0  0                 0   
10                     0                 0  0  0                 0   
11                     0                 0  0  0                 0   
12                     0                 0  0  0                 0   
13                     0                 0  0  0                 0   
14                     0       49467/72995  0  0                 0   
15                     0  7658739/15626431  0  0                 0   
16                     0  7658739/15626431  0  0                 0   
17                     0                 0  0  0                 0   
18                     0                 0  0  0                 0   
19                     0                 0  0  0                 0   
20                     0                 0  0  0                 0   
21                     0                 0  0  0                 0   
22                     0                 0  0  0                 0   
23                     0       49467/72995  0  0                 0   
24                     0  7658739/15626431  0  0                 0   
25                     0  7658739/15626431  0  0                 0   
26                     0                 0  0  0           106/243   
27                     0                 0  0  0       16031/72995   
28                     0                 0  0  0        3143/16201   
29                     0                 0  0  0  2375174/17955721   
...                  ...               ... .. ..               ...   
105203        3143/16201                 0  0  0                 0   
105204  2375174/17955721                 0  0  0                 0   
105205  2375174/17955721                 0  0  0                 0   
105206        3143/16201                 0  0  0                 0   
105207  2375174/17955721                 0  0  0                 0   
105208  2375174/17955721                 0  0  0                 0   
105209           148/543                 0  0  0                 0   
105210      17601/128402                 0  0  0                 0   
105211      17601/128402                 0  0  0                 0   
105212           148/543                 0  0  0                 0   
105213      17601/128402                 0  0  0                 0   
105214      17601/128402                 0  0  0                 0   
105215                 0                 0  0  0        3143/16201   
105216                 0                 0  0  0  2375174/17955721   
105217                 0                 0  0  0  2375174/17955721   
105218                 0                 0  0  0        3143/16201   
105219                 0                 0  0  0  2375174/17955721   
105220                 0                 0  0  0  2375174/17955721   
105221                 0                 0  0  0        3143/16201   
105222                 0                 0  0  0  2375174/17955721   
105223                 0                 0  0  0  2375174/17955721   
105224                 0                 0  0  0        3143/16201   
105225                 0                 0  0  0  2375174/17955721   
105226                 0                 0  0  0  2375174/17955721   
105227                 0                 0  0  0           148/543   
105228                 0                 0  0  0      17601/128402   
105229                 0                 0  0  0      17601/128402   
105230                 0                 0  0  0           148/543   
105231                 0                 0  0  0      17601/128402   
105232                 0                 0  0  0      17601/128402   

                      32  
0                      0  
1                      0  
2                      0  
3                      0  
4                137/243  
5            23831/81005  
6             7497/72995  
7       1421917/15626431  
8                      0  
9                      0  
10                     0  
11                     0  
12                     0  
13                     0  
14                     0  
15                     0  
16                     0  
17               137/243  
18            7497/72995  
19           23831/81005  
20      1562587/17955721  
21           23831/81005  
22      1562587/17955721  
23            7497/72995  
24      1421917/15626431  
25      1421917/15626431  
26                     0  
27                     0  
28                     0  
29                     0  
...                  ...  
105203                 0  
105204                 0  
105205                 0  
105206                 0  
105207                 0  
105208                 0  
105209                 0  
105210                 0  
105211                 0  
105212                 0  
105213                 0  
105214                 0  
105215                 0  
105216                 0  
105217                 0  
105218                 0  
105219                 0  
105220                 0  
105221                 0  
105222                 0  
105223                 0  
105224                 0  
105225                 0  
105226                 0  
105227                 0  
105228                 0  
105229                 0  
105230                 0  
105231                 0  
105232                 0  

[105233 rows x 32 columns]
} 

不幸的是我不是很实际,我寻求帮助。 最好,尼古洛

我敢肯定,有一种更简单,更短,更优雅,更pythonic的方法来解决此问题,但是在此期间有一种解决方案。 它返回带有前16个术语与后16个术语相同的行的df。这里是一个行和列很少的示例:

df = pd.DataFrame({'a':[4,2,4,5,5,4],
                'b':[4,3,1,2,2,4],
                'c':[1,2,4,5,5,3],
                'd': [4, 3, 2, 2, 2, 4],})
print df
   a  b  c  d
0  4  4  1  4
1  2  3  2  3
2  4  1  4  2
3  5  2  5  2
4  5  2  5  2
5  4  4  3  4

df_a = df.iloc[:,:2]
df_b = df.iloc[:,2:]
df_b.columns = df_a.columns
c = df_b-df_a
c = c.applymap(lambda x: True if x!=0 else False)
df_a = df_a.mask(c)
a = pd.isnull(df_a).any(1).nonzero()[0]
df = df.drop(df.index[a])

输出:

   a  b  c  d
1  2  3  2  3
3  5  2  5  2
4  5  2  5  2

在您的情况下:

df_a = df.iloc[:,:16]
df_b = df.iloc[:,16:]

感谢您的回答。 由于一个或另一个原因,它们都不起作用,但是它们很有用。 我找到了一个古老的解决方案,虽然不是很优雅,但是可以正常工作:

import pandas as pd
data = pd.read_csv('enummixed.txt', header = None, low_memory=False)
data = data.drop(data.columns[[0]],axis=1)
for i in data.index:
    k=0
    for j in range(0,15):
        if (data.iloc[i,j]==data.iloc[i,j+16]) is True:
            k+=1
            if k==15:
                print(data.loc[i], file=open("symmetric_ne.txt", "a"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM