简体   繁体   中英

How to drop rows from a dataframe based on condition in python?

So I have a CSV file that has data in the following manner:

|Variable |Time |Value|

|A1       |Jan  | 33  |

|         |Feb  | 21  |   

|         |Mar  | 08  |   

|         |Apr  | 17  |   

|         |May  | 04  |   

|         |Jun  | 43  |   

|         |Jul  | 40  |   

|         |Aug  | 37  |   

|         |Sep  | 30  |   

|         |Oct  | 46  |   

|         |Nov  | 10  | 

|         |Dec  | 13  | 

| B1      |Jan  | 20  |       

|         |Feb  | 11  |   

|         |Mar  | 02  |   

|         |Apr  | 18  |   

|         |May  | 10  |   

|         |Jun  | 35  |   

|         |Jul  | 45  |   

|         |Aug  | 32  |   

|         |Sep  | 39  |   

|         |Oct  | 42  |   

|         |Nov  | 15  | 

|         |Dec  | 18  | 

Like this it goes on until A10 and B10.

I need only A with time from Jan to Dec along with the values and drop values corresponding to B. How to do it? What will be the condition?

Two different methods:

If the column widths are fixed:

df = pd.read_fwf('file.csv', colspecs=[(1,9), (11,16), (17, 22)])
df = df[df.replace('', np.nan).ffill()['Variable'].str.startswith('A')]
print(df)

Output:

   Variable Time  Value
0        A1  Jan     33
1            Feb     21
2            Mar      8
3            Apr     17
4            May      4
5            Jun     43
6            Jul     40
7            Aug     37
8            Sep     30
9            Oct     46
10           Nov     10
11           Dec     13

If things are more dirty:

with open('file.csv', 'r') as f:
    df = pd.DataFrame([[y.strip() for y in x.split('|')[1:4]] for x in f.readlines() if x.strip()])
df.columns = df.iloc[0].values
df = df.drop(0).reset_index(drop=True)
df['Value'] = pd.to_numeric(df['Value'])
print(df)

Output:

   Variable Time  Value
0        A1  Jan     33
1            Feb     21
2            Mar      8
3            Apr     17
4            May      4
5            Jun     43
6            Jul     40
7            Aug     37
8            Sep     30
9            Oct     46
10           Nov     10
11           Dec     13
12       B1  Jan     20
13           Feb     11
14           Mar      2
15           Apr     18
16           May     10
17           Jun     35
18           Jul     45
19           Aug     32
20           Sep     39
21           Oct     42
22           Nov     15
23           Dec     18

Assuming your data is arranged as you described, with some extrapolation as below在此处输入图像描述

Use pandas' ffill() to impute the variable column to facilitate the desired selection as below.

sample = pd.read_csv('sample.csv')
sample['Variable'].ffill(axis=0,inplace=True)
sample = sample.loc[sample['Variable'].str.startswith('A')]
n_months = 12
indexes_to_impute_as_empty = list(range(0,len(sample),n_months))
sample.loc[indexes_to_impute_as_empty,'temp_Variable'] = sample.loc[indexes_to_impute_as_empty,'Variable']
sample['Variable'] = sample['temp_Variable']
sample.drop(columns=['temp_Variable'],inplace=True)
sample.replace(np.nan,"",inplace=True)
sample

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM