简体   繁体   English

使用 python 修改 csv 文件中的一些行

[英]Modify some rows in a csv file using python

I have a csv file as follows:我有一个 csv 文件如下:

name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80

What I am trying to do is whenever the column field is 0, add the field length to column and row should be decremented by 1. Also if there is a subsequent line where column is again 0, then add the first field length to the column, decrease row by 1 for the first occurrence and add the field lengths together.我想做的是,每当列字段为 0 时,将字段长度添加到列中,行应递减 1。此外,如果后续行中列再次为 0,则将第一个字段长度添加到列中, 将第一次出现的行减少 1,并将字段长度加在一起。 And then delete the older lines.然后删除旧行。

So in the above csv - "AB000M,10,0,80" would become "AB000M,9,80,80" and "AB000M,21,0,80所以在上面的 csv - “AB000M,10,0,80”将变成“AB000M,9,80,80”和“AB000M,21,0,80”
AB000M,22,0,80" --- these two lines should be replaced by "AB000M,20,80,160". AB000M,22,0,80" --- 这两行应替换为"AB000M,20,80,160"。

I am trying to achieve this with this snippet but its not working:我试图用这个片段来实现这个但它不起作用:

df = pd.read_csv("file.csv")
for ind in df.index:
    if ind >= len(df)-1:
        break
    if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
    elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
        df['length_of_field'][ind] += df['length_of_field'][ind + 1]
        df.drop([df.index[ind + 1]], axis=0)

This is an example of something that may work for you.这是可能对您有用的示例。

import pandas as pd

df = pd.read_csv('test.csv')
newRows = []
last_val_is_zero = False
tempRow = None
for row in df.iterrows():
    vals = row[1]
    if vals['column'] == 0:
        if not last_val_is_zero:
            vals['row'] = vals['row'] - 1
            vals['column'] = vals['length_of_field']
            tempRow = vals
            last_val_is_zero = True
        else:
            tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
    else:
        if tempRow is not None:
            newRows.append(tempRow)
        newRows.append(vals)
        tempRow = None
        last_val_is_zero = False

if tempRow is not None:
    newRows.append(tempRow)
    
newData = [[val for val in row] for row in newRows]
newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])

sample data:样本数据:

df_str = '''
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
AB000M,23,11,1
AB000M,24,11,1
AB000M,25,0,80
AB000M,26,0,80
AB000M,27,0,80
AB000M,28,11,1
AB000M,29,0,80
'''
df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)

solution:解决方案:

# split the row which to update or left
cond = df['column'] == 0
df_to_update = df[cond].copy()
df_left = df[~cond].copy()

# modify the update rows
df_to_update['column'] = df_to_update['length_of_field']
df_to_update['row'] -= 1

# create tag for which is diff 1 with the previous row
cond = df_to_update['row'].diff() != 1
df_to_update['tag'] = np.where(cond, 1, 0)

# cumsum tag to creat group
df_to_update['label'] = df_to_update['tag'].cumsum()

print(df_to_update)

#       name  row  column  length_of_field  tag  label
# 2   AB000M    9      80               80    1      1
# 5   AB000M   20      80               80    1      2
# 6   AB000M   21      80               80    0      2
# 9   AB000M   24      80               80    1      3
# 10  AB000M   25      80               80    0      3
# 11  AB000M   26      80               80    0      3
# 13  AB000M   28      80               80    1      4


# agg groupy left first row, and sum(length_of_field)
obj_list = []
for tag, group in df_to_update.groupby('label'):
    obj = group.iloc[0].copy()
    obj['length_of_field'] = group['length_of_field'].sum()
    obj_list.append(obj)
dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]    

# merge final result
dfn = df_left.append(dfn_to_update).sort_index()

result:结果:

print(dfn)

      name row column length_of_field
0   AB000M   8     12               1
1   AB000M   9     12               1
2   AB000M   9     80              80
3   AB000M  10     12               1
4   AB000M  11      1               1
5   AB000M  20     80             160
7   AB000M  23     11               1
8   AB000M  24     11               1
9   AB000M  24     80             240
12  AB000M  28     11               1
13  AB000M  28     80              80

print(df)

      name  row  column  length_of_field
0   AB000M    8      12                1
1   AB000M    9      12                1
2   AB000M   10       0               80
3   AB000M   10      12                1
4   AB000M   11       1                1
5   AB000M   21       0               80
6   AB000M   22       0               80
7   AB000M   23      11                1
8   AB000M   24      11                1
9   AB000M   25       0               80
10  AB000M   26       0               80
11  AB000M   27       0               80
12  AB000M   28      11                1
13  AB000M   29       0               80

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM