How to drop rows from a dataframe based on condition in python?

Question

So I have a CSV file that has data in the following manner:

|Variable |Time |Value|

|A1       |Jan  | 33  |

|         |Feb  | 21  |   

|         |Mar  | 08  |   

|         |Apr  | 17  |   

|         |May  | 04  |   

|         |Jun  | 43  |   

|         |Jul  | 40  |   

|         |Aug  | 37  |   

|         |Sep  | 30  |   

|         |Oct  | 46  |   

|         |Nov  | 10  | 

|         |Dec  | 13  | 

| B1      |Jan  | 20  |       

|         |Feb  | 11  |   

|         |Mar  | 02  |   

|         |Apr  | 18  |   

|         |May  | 10  |   

|         |Jun  | 35  |   

|         |Jul  | 45  |   

|         |Aug  | 32  |   

|         |Sep  | 39  |   

|         |Oct  | 42  |   

|         |Nov  | 15  | 

|         |Dec  | 18  |

Like this it goes on until A10 and B10.

I need only A with time from Jan to Dec along with the values and drop values corresponding to B. How to do it? What will be the condition?

Answer 1

Two different methods:

If the column widths are fixed:

df = pd.read_fwf('file.csv', colspecs=[(1,9), (11,16), (17, 22)])
df = df[df.replace('', np.nan).ffill()['Variable'].str.startswith('A')]
print(df)

Output:

   Variable Time  Value
0        A1  Jan     33
1            Feb     21
2            Mar      8
3            Apr     17
4            May      4
5            Jun     43
6            Jul     40
7            Aug     37
8            Sep     30
9            Oct     46
10           Nov     10
11           Dec     13

If things are more dirty:

with open('file.csv', 'r') as f:
    df = pd.DataFrame([[y.strip() for y in x.split('|')[1:4]] for x in f.readlines() if x.strip()])
df.columns = df.iloc[0].values
df = df.drop(0).reset_index(drop=True)
df['Value'] = pd.to_numeric(df['Value'])
print(df)

Output:

   Variable Time  Value
0        A1  Jan     33
1            Feb     21
2            Mar      8
3            Apr     17
4            May      4
5            Jun     43
6            Jul     40
7            Aug     37
8            Sep     30
9            Oct     46
10           Nov     10
11           Dec     13
12       B1  Jan     20
13           Feb     11
14           Mar      2
15           Apr     18
16           May     10
17           Jun     35
18           Jul     45
19           Aug     32
20           Sep     39
21           Oct     42
22           Nov     15
23           Dec     18

Answer 2

Assuming your data is arranged as you described, with some extrapolation as below

Use pandas' ffill() to impute the variable column to facilitate the desired selection as below.

sample = pd.read_csv('sample.csv')
sample['Variable'].ffill(axis=0,inplace=True)
sample = sample.loc[sample['Variable'].str.startswith('A')]
n_months = 12
indexes_to_impute_as_empty = list(range(0,len(sample),n_months))
sample.loc[indexes_to_impute_as_empty,'temp_Variable'] = sample.loc[indexes_to_impute_as_empty,'Variable']
sample['Variable'] = sample['temp_Variable']
sample.drop(columns=['temp_Variable'],inplace=True)
sample.replace(np.nan,"",inplace=True)
sample

How to drop rows from a dataframe based on condition in python?

Question

2 answers

solution1
0 2022-05-26 06:15:30

solution2
0 ACCPTED 2022-05-26 06:46:19

How to drop rows from a dataframe based on condition in python?

Question

2 answers

solution1 0 2022-05-26 06:15:30

solution2 0 ACCPTED 2022-05-26 06:46:19

solution1
0 2022-05-26 06:15:30

solution2
0 ACCPTED 2022-05-26 06:46:19