简体   繁体   English

替换字符串列中的数据框值,获取要从另一列替换的值

[英]Replace dataframe value in string column getting the value to replace from another column

I'm trying to elaborate three csv file and create only one file merging the useful data.我正在尝试详细说明三个csv文件并仅创建一个合并有用数据的文件。

Now, I'm stuck on this problem:现在,我被这个问题困住了:

I have two columns ( SUFFIX and COD_METEL ), with 1.5 Millions of rows, that I need to elaborate and create another column containing the results.我有两列( SUFFIXCOD_METEL ),有 150 万行,我需要详细说明并创建包含结果的另一列。

        SUFFIX    COD_METEL
0          CBR   CBR8901027
1          CBR   CBR8901028
2          CBR   CBR8904001
3          CBR   CBR8904002
4          CBR   CBR8904008
5          CBR   CBR8904027
6          CBR   CBR8904039
7          THO  THO96666290
8          THO  THO96666294
9          THO  THO96666298
10         THO  THO96666302
11         THO  THO96666322
12         THO  THO96666326
13          ZV   ZV111900NI
14          ZV   ZV111910NI
15          ZX    ZX2021.AC
16          ZX    ZX2021.AC
17          ZX    ZX6066.AC
18          ZX    ZX6111.AC
19          ZX    ZX6111.AC
20          ZX    ZX6380.AC
21          ZX       ZX9030
22          ZX       ZX9030
23          ZX       ZX9030
24          ZZ   ZZ00012565

Here I need to "subtract" the SUFFIX value to the COD_METEL , like this:在这里,我需要将SUFFIX值“减去”到COD_METEL ,如下所示:

df["RESULT"] = df["COD_METEL"] - df["SUFFIX"]

        SUFFIX    COD_METEL     RESULT
0          CBR   CBR8901027    8901027
1          CBR   CBR8901028    8901028
2          CBR   CBR8904001    8904001

I know that is not possible to use the "-" operator, so I'm asking you some tips to figure out this problem, and replace all the value in a faster way.我知道不可能使用“-”运算符,所以我问你一些提示来解决这个问题,并以更快的方式替换所有值。

I have already tried to do some tests:我已经尝试做一些测试:

replaceList = list(set(df["SUFFIX"]))
for to_replace in replaceList:
    df["RESULT"] = df["COD_METEL"].str.replace(to_replace,"")

You can try list comprehension if no missing values:如果没有缺失值,您可以尝试list comprehension

df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
print (df)
   SUFFIX    COD_METEL       new
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

Performance:表现:

#[250000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#print (df)

In [289]: %timeit df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
5.05 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [290]: %timeit df['new'] = [j.replace(i, '') for i, j in zip(df['SUFFIX'], df['COD_METEL'])]
98.7 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Another approach would be:另一种方法是:

df['RESULT'] = df.apply(lambda x: x['COD_METEL'].replace(x['SUFFIX'], ''), axis=1)
df

   SUFFIX    COD_METEL    RESULT
0     CBR   CBR8901027   8901027
1     CBR   CBR8901028   8901028
2     CBR   CBR8904001   8904001
3     CBR   CBR8904002   8904002
4     CBR   CBR8904008   8904008
5     CBR   CBR8904027   8904027
6     CBR   CBR8904039   8904039
7     THO  THO96666290  96666290
8     THO  THO96666294  96666294
9     THO  THO96666298  96666298
10    THO  THO96666302  96666302
11    THO  THO96666322  96666322
12    THO  THO96666326  96666326
13     ZV   ZV111900NI  111900NI
14     ZV   ZV111910NI  111910NI
15     ZX    ZX2021.AC   2021.AC
16     ZX    ZX2021.AC   2021.AC
17     ZX    ZX6066.AC   6066.AC
18     ZX    ZX6111.AC   6111.AC
19     ZX    ZX6111.AC   6111.AC
20     ZX    ZX6380.AC   6380.AC
21     ZX       ZX9030      9030
22     ZX       ZX9030      9030
23     ZX       ZX9030      9030
24     ZZ   ZZ00012565  00012565

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe 用另一列的值替换部分字符串 - Pandas Dataframe replace part of string with value from another column Python-用另一列中的值替换字符串 - Python - Replace String with Value from another column 将数据框子节中的1值替换为另一列的值 - Replace 1 value in a subsection of dataframe with value of another column 用列值上的另一个数据帧中的相同行替换数据帧行 - Replace dataframe rows with identical rows from another dataframe on a column value 将列值的字符替换为Pandas中另一列的字符串 - Replace character of column value with string from another column in Pandas 如何用另一个值替换数据框列中的空白? - How to replace the blanks in a column of a dataframe with another value? 用不同表中的另一列替换数据框中的值 - Replace a value in the dataframe with another column from a different table 如果某列的值是特定字符串,则将其替换为另一列的值 - If a value of a column is a specific string, replace it with the value of another column 使用 Pandas 将特定列值替换为另一个数据框列值 - Replace specific column values with another dataframe column value using Pandas 如何在不迭代每一列的情况下有条件地将 dataframe 的一列中的值替换为另一列的值? - How to conditionally replace the value from one column of a dataframe with the value of another without iterating each column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM