[英]Replace dataframe value in string column getting the value to replace from another column
I'm trying to elaborate three csv file and create only one file merging the useful data.我正在尝试详细说明三个csv文件并仅创建一个合并有用数据的文件。
Now, I'm stuck on this problem:现在,我被这个问题困住了:
I have two columns ( SUFFIX and COD_METEL ), with 1.5 Millions of rows, that I need to elaborate and create another column containing the results.我有两列( SUFFIX和COD_METEL ),有 150 万行,我需要详细说明并创建包含结果的另一列。
SUFFIX COD_METEL
0 CBR CBR8901027
1 CBR CBR8901028
2 CBR CBR8904001
3 CBR CBR8904002
4 CBR CBR8904008
5 CBR CBR8904027
6 CBR CBR8904039
7 THO THO96666290
8 THO THO96666294
9 THO THO96666298
10 THO THO96666302
11 THO THO96666322
12 THO THO96666326
13 ZV ZV111900NI
14 ZV ZV111910NI
15 ZX ZX2021.AC
16 ZX ZX2021.AC
17 ZX ZX6066.AC
18 ZX ZX6111.AC
19 ZX ZX6111.AC
20 ZX ZX6380.AC
21 ZX ZX9030
22 ZX ZX9030
23 ZX ZX9030
24 ZZ ZZ00012565
Here I need to "subtract" the SUFFIX value to the COD_METEL , like this:在这里,我需要将SUFFIX值“减去”到COD_METEL ,如下所示:
df["RESULT"] = df["COD_METEL"] - df["SUFFIX"]
SUFFIX COD_METEL RESULT
0 CBR CBR8901027 8901027
1 CBR CBR8901028 8901028
2 CBR CBR8904001 8904001
I know that is not possible to use the "-" operator, so I'm asking you some tips to figure out this problem, and replace all the value in a faster way.我知道不可能使用“-”运算符,所以我问你一些提示来解决这个问题,并以更快的方式替换所有值。
I have already tried to do some tests:我已经尝试做一些测试:
replaceList = list(set(df["SUFFIX"]))
for to_replace in replaceList:
df["RESULT"] = df["COD_METEL"].str.replace(to_replace,"")
You can try list comprehension
if no missing values:如果没有缺失值,您可以尝试
list comprehension
:
df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
print (df)
SUFFIX COD_METEL new
0 CBR CBR8901027 8901027
1 CBR CBR8901028 8901028
2 CBR CBR8904001 8904001
3 CBR CBR8904002 8904002
4 CBR CBR8904008 8904008
5 CBR CBR8904027 8904027
6 CBR CBR8904039 8904039
7 THO THO96666290 96666290
8 THO THO96666294 96666294
9 THO THO96666298 96666298
10 THO THO96666302 96666302
11 THO THO96666322 96666322
12 THO THO96666326 96666326
13 ZV ZV111900NI 111900NI
14 ZV ZV111910NI 111910NI
15 ZX ZX2021.AC 2021.AC
16 ZX ZX2021.AC 2021.AC
17 ZX ZX6066.AC 6066.AC
18 ZX ZX6111.AC 6111.AC
19 ZX ZX6111.AC 6111.AC
20 ZX ZX6380.AC 6380.AC
21 ZX ZX9030 9030
22 ZX ZX9030 9030
23 ZX ZX9030 9030
24 ZZ ZZ00012565 00012565
Performance:表现:
#[250000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#print (df)
In [289]: %timeit df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
5.05 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [290]: %timeit df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
98.7 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Another approach would be:另一种方法是:
df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
df
SUFFIX COD_METEL RESULT
0 CBR CBR8901027 8901027
1 CBR CBR8901028 8901028
2 CBR CBR8904001 8904001
3 CBR CBR8904002 8904002
4 CBR CBR8904008 8904008
5 CBR CBR8904027 8904027
6 CBR CBR8904039 8904039
7 THO THO96666290 96666290
8 THO THO96666294 96666294
9 THO THO96666298 96666298
10 THO THO96666302 96666302
11 THO THO96666322 96666322
12 THO THO96666326 96666326
13 ZV ZV111900NI 111900NI
14 ZV ZV111910NI 111910NI
15 ZX ZX2021.AC 2021.AC
16 ZX ZX2021.AC 2021.AC
17 ZX ZX6066.AC 6066.AC
18 ZX ZX6111.AC 6111.AC
19 ZX ZX6111.AC 6111.AC
20 ZX ZX6380.AC 6380.AC
21 ZX ZX9030 9030
22 ZX ZX9030 9030
23 ZX ZX9030 9030
24 ZZ ZZ00012565 00012565
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.